Datasets¶
Workforce Dynamics¶
Download a sample of our Workforce Dynamics files here.
This dataset contains aggregated workforce statistics. Every row is a distinct level of aggregation and month combination. Generally, the broadest configuration of this dataset is the company and month level. In that case, every row observes a particular company in a given month. If we include country as a level of aggregation, then each row of the dataset would correspond to a company, country, and month combination. The dataset at the company-country-month level can be aggregated to create the company-month dataset.
Let’s take a look at an example output where we have the levels of aggregation as company, country tracked across month and let count be the outcome of interest that represents the total headcounts for that particular level of aggregation, month combination (the count represents the headcount at the end of that particular month):
company |
country |
month |
count |
---|---|---|---|
Company A |
U.S. |
2021-01 |
10 |
Company A |
U.S. |
2021-02 |
12 |
Company A |
U.S. |
2021-03 |
14 |
Company A |
Canada |
2021-01 |
10 |
Company A |
Canada |
2021-02 |
11 |
Company A |
Canada |
2021-03 |
9 |
This enables us to visualize the table as a graph as well, where the month can be represented along the X-axis, and the outcome count can be represented along the Y-axis. Thus, in this case (Company A, U.S.) and (Company A, Canada) can be viewed as entities for which the outcome count is tracked over time (month) on this graph.
Note that it’s easy to compute a broader level of aggregation from a narrower level of aggregation. To reduce our previous example to the company and month level, we can sum across the country column to get:
company |
month |
count |
---|---|---|
Company A |
2021-01 |
20 (10+10) |
Company A |
2021-02 |
23 (12+11) |
Company A |
2021-03 |
23 (14+9) |
Levels of Aggregation¶
We can construct the Workforce Dynamics file across different levels of aggregation, including combinations of the following:
Company (categorical): Revelio Labs’ delivery file can provide insights on more than 20 million companies globally. By default, all subsidiaries of a the company are included.
Rcid (categorical): Revelio Labs company ID
Region (categorical): Our broadest geographical level of aggregation is region. We classify locations into 15 distinct geographical regions:
Northern America
Central America
Southern America
Northern Europe
Southern Europe
Eastern Europe
Western Europe
Southern Asia
South-Eastern Asia
Eastern Asia
Central and Western Asia
Pacific Islands
Arab States
Northern Africa
Sub-Saharan Africa
Country (categorical): The granularity also can be specified at the country level for 247 distinct countries.
State (categorical): The granularity can be specified at the state level, including international locations.
Metro_area (categorical): Our most narrow level of aggregation for geography is metro area. Employees may be included under a country or state’s “non-metropolitan area” if we do not have enough information to assign them to a specific metro area in that geography.
Job Category (categorical): In addition to aggregating by geography, we can also aggregate by occupation or role. Our broadest role classification groups positions into the following 7 job categories:
Admin
Engineer
Finance
Marketing
Operations
Sales
Scientist
The job role taxonomy is developed by our proprietary representation and clustering algorithms. We develop mathematical representations of each job title using the title itself, the text description of the position (from either individuals describing their own experiences or employers on a job posting), individuals’ skills, associates, and previous experience. Our clustering algorithm is in the family of hierarchical/agglomerative clustering algorithms. This means that we begin with every job title occupying its own cluster, then iteratively combine clusters based on a set of criteria. This allows for complete flexibility of the number of clusters. We update this taxonomy periodically to adjust to the changing occupational landscape. Please see our Methodology section for more details on our job taxonomy.
Role_kn (categorical): Aggregated position role with n discrete levels. We can provide roles at several levels of aggregation, including the following: role_k50, role_k150, role_k300, role_k500, role_k1500. For Workforce Dynamics, the most granular role classification we recommend is role_k150.
Seniority (ordinal): Seniority ranges from 1 to 7. 1 is the most junior, and 7 is the most senior (see the Methodology section for more details). Our seniority model predicts seniority based on the title, company, industry, age, previous seniority, and position history.
Gender (categorical): Gender is calculated as a probability based on the likelihood of the first name being male or female.
Ethnicity (categorical): Ethnicity is estimated based on the likelihood of both the first and last name as well as an individual’s location.
Month (categorical): The month and year of the position are provided in “YYYY-MM” format. Each Workforce Dynamics file contains monthly data up to the previous month’s end.
Outcomes¶
We can include the following outcomes as columns in the Workforce Dynamics file:
Count (float): The total number of employees for a specific level of aggregation for each month. Please note that these counts can be decimals (see our FAQ for more details).
Inflow/Outflow (float): The total inflow and outflow counts of employees at each level of aggregation for a given month
External Inflow/Outflow (float): Total inflow and outflow counts of employees at each level of aggregation for a given month, excluding internal movements within a company
Salary (float): Sum of estimated annual salaries of employees at each level of aggregation in a given month, in USD. We predict the salary for each position based on role, seniority, company, and country using a regression-based model. We train this model using over 200 million salaries from job postings and publicly available labor certification applications, and use country-level inflation rates to estimate the change in salary over time. We get an out-of-sample root mean squared error (RMSE) of 14%. The Salary column in Workforce Dynamics is the sum of salaries at a specific level of aggregation; please divide by the Count column to get the average salary of employees in that level.
Total_prestige (float): We can predict the average prestige level of employees at each level of aggregation in a given month. We calculate the prestige score of each position using world university rankings to set prior values for our base model, with information then being redistributed among all positions according to the changing networks created by worker inflows and outflows. The Total_prestige is the numerator of our prestige score.
Prestige_weight (float): Denominator of our prestige score. To calculate average prestige for a certain level of aggregation, please divide Total_prestige by Prestige_weight.
Duration (float): The average tenure of employees in the specified level of aggregation in years.
Please see the FAQ section for more information on outcomes and levels of aggregation in our Workforce Dynamics files.
Skill Dynamics¶
We can also provide a version of the Workforce Dynamics file with skills as a level of aggregation. The skill categories that can be included are:
Skill_k25
Skill_k50
Skill_k75
More information on these skill categories, and our Skills Taxonomy in general, is available in our Methodology section.
Individual employees (users) in our data are associated with sets of skills. The counts for each skill_k category in the Skill Dynamics file represent the number of distinct employees who have skills in that category, who are included in a specified level of aggregation each month. The inflow and outflow columns represent the number of employees with skills in each category who have entered or exited each level of aggregation each month.
Please note that as employees can have multiple skills, or may not report skills at all, the counts in the Skill Dynamics file may be different than the headcounts in the Workforce Dynamics file. This is especially true when aggregating across skill_k categories, as employees may be counted in more than one skill_k category.
Transitions¶
This dataset contains information on transitions into and out of a set of base companies.
The data consists of two files: Inflows and Outflows. Each row provides data on an individual transition, including the previous and new roles, location, seniority, and salary of individuals leaving or joining the company. The base company in the Inflows file is denoted by the ‘new’ prefix, while the base company in the Outflows file is denoted by ‘prev’.
Download a sample of our Transitions files here.
User_id (categorical): Revelio Labs user ID
Prev_rcid (categorical): Revelio Labs company ID of previous company
Prev_position_id (categorical): Previous position ID
Prev_company (categorical): Previous company name
Prev_seniority (ordinal): Previous seniority level with 7 discrete levels
Prev_region (categorical): Previous region
Prev_country (categorical): Previous country
Prev_state (categorical): Previous state
Prev_metro_area (categorical): Previous metropolitan area
Prev_jobtitle (categorical): Previous job title
Prev_job_category (categorical): Aggregated previous position role with 7 discrete levels
Prev_role_k50 (categorical): Aggregated previous position role with 50 discrete levels
Prev_role_k150 (categorical): Aggregated previous position role with 150 discrete levels
Prev_enddate (time): End date of previous position
Prev_salary (float): Estimated annual salary of the previous role (in USD)
New_position_id (categorical): New position ID
New_rcid (categorical): Revelio Labs company ID of new company
New_company (categorical): New company name
New_seniority (ordinal): New seniority level with 7 discrete levels
New_region (categorical): New region
New_country (categorical): New country
New_state (categorical): New state
New_metro_area (categorical): New metropolitan area
New_jobtitle (categorical): New job title
New_job_category (categorical): Aggregated new position role with 7 discrete levels
New_role_k50 (categorical): Aggregated new position role with 50 discrete levels
New_role_k150 (categorical): Aggregated new position role with 150 discrete levels
New_startdate (time): Start date of new position
New_salary (float): Estimated annual salary of the new role (in USD)
Job Postings¶
Revelio Labs provides job postings data in two formats: Job Posting Dynamics (an aggregated time series of monthly job posting statistics), and Individual Job Postings. Our job postings data comes from several sources, including job posting aggregator sites and company websites. We can provide job postings either via COSMOS, our unified job posting dataset which has been standardized and deduplicated across our different postings sources, or separately by source. The coverage of the data is global.
Job Posting Dynamics¶
This dataset contains aggregated job posting statistics. Every row is a distinct level of aggregation and month combination. Generally, the broadest configuration of this dataset is the company and month level. Each row would correspond to a company and month combination. For more information on the levels of aggregation, please refer to the Workforce Dynamics section.
Download a sample of our Job Posting Dynamics data here.
Rcid (categorical): Revelio Labs company ID
Company (categorical): Company name
Country (categorical): Country location of job posting
State (categorical): State location of job posting
Job_category (categorical): Aggregated posting role with 7 discrete levels
Role_k50 (categorical): Aggregated posting role with 50 discrete levels
Role_k150 (categorical): Aggregated posting role with 150 discrete levels
Month (categorical): The month and year provided in “YYYY-MM” format
Active_posting (float): Number of active postings during that month
New_posting (float): Number of new postings during that month
Removed_posting (float): Number of postings removed during that month:
Active_salary_avg (float): Average salary for active postings during that month
New_salary_avg (float): Average salary for new postings during that month
Removed_salary_avg (float): Average salary for postings that got removed during that month
Filling_time_avg (float): Average time to fill, in months
Expected_hires (float): The total number of hires expected for active postings in each level of aggregation and month (COSMOS only)
Individual Job Postings¶
Revelio Labs also provides data on individual job postings. These files contain posting-level information on current and historical job postings such as posting date, location, role, and salary.
Download a sample of our COSMOS Individual Job Postings data here.
Job_id (categorical): Posting key
Rcid (categorical): Revelio Labs company ID
Company (categorical): Company name
Rics_k50 (categorical): Industry of employer with 50 discrete categories (Revelio Labs mapped)
Rics_k200 (categorical): Industry of employer with 200 discrete categories (Revelio Labs mapped)
Rics_k400 (categorical): Industry of employer with 400 discrete categories (Revelio Labs mapped)
Title_raw (categorical): Position title (raw from posting)
Title_translated (categorical): Raw position title translated to English
Job_category (categorical): Aggregated position role with 7 discrete levels
Role_k50 (categorical): Aggregated position role with 50 discrete levels
Role_k150 (categorical): Aggregated position role with 150 discrete levels
Role_k1500 (categorical): Aggregated position role with 1500 discrete levels
State, country (categorical): Listed location for posting
Salary (float): Predicted salary for posting
Post_date (categorical): Date at which the job was posted
Remove_date (categorical): Date at which the job was removed. If null, it hasn’t been removed yet.
Ultimate_parent_rcid (categorical): Revelio Labs company ID for the parent company
Ultimate_parent_company_name (categorical): Name of the parent company
Remote_type (categorical): Type of remote work a job posting offers. If not specified, the job is categorized as “Fully in Office.”
Expected_hires (float): The expected number of hires for each job posting (COSMOS only)
Source_* (boolean): Indicator for whether a job posting was found in each data source (e.g. company websites, LinkedIn, Indeed, etc.) (COSMOS only)
Sentiment¶
Download a sample of our Sentiment data here.
Individual Reviews¶
Revelio Labs provides company review data with the following information. Note that not all rating fields are required to be filled out by the reviewer. Also, some ratings (ie., ‘culture and values’ and ‘diversity and inclusion’) were added more recently.
Rcid (categorical): Revelio Labs company ID
Company (categorical): Company name
Review_id (categorical): Review ID
Title_raw (categorical): Reviewer’s raw position title
Location_raw (categorical): Reviewer’s raw location
Region (categorical): Reviewer’s region
Country (categorical): Reviewer’s country
State (categorical): Reviewer’s state
Metro_area (categorical): Reviewer’s metropolitan area. Reviews may be assigned to a country or state’s “non-metropolitan area” if we do not have enough information to assign them to a specific metro area in that geography.
Review_language_id (categorical): Language of the review
Review_date_time (time): Posting date of the review
Review_iscovid19 (boolean): Indicates whether review mentions the Covid-19 pandemic
Reviewer_current_job (boolean): Indicates whether the reviewer is a current or former employee
Reviewer_employment_status (categorical): Reviewer’s employment type (freelance, part time, intern, contract, regular)
Reviewer_job_ending_year (integer): Final year of the reviewer’s employment with the company
Reviewer_length_of_employment (integer): Number of years that the reviewer worked at the company
Rating_overall (integer): Reviewer’s overall rating of the company (integer values from 1 to 5, with 5 being the best)
Rating_career_opportunities (float): Reviewer’s rating of the company’s career opportunities (from 1 to 5, with half-points awarded, and 5 being the best)
Rating_compensation_and_benefits (float): Reviewer’s rating of the company’s compensation and benefits (from 1 to 5, with half-points awarded, and 5 being the best)
Rating_culture_and_values (integer): Reviewer’s rating of the company’s culture and values (integer values from 1 to 5, with 5 being the best)
Rating_diversity_and_inclusion (integer): Reviewer’s rating of the company’s diversity and inclusion (integer values from 1 to 5, with 5 being the best)
Rating_senior_leadership (float): Reviewer’s rating of the company’s senior management (from 1 to 5, with half-points awarded, and 5 being the best)
Rating_work_life_balance (float): Reviewer’s rating of the company’s work-life balance (from 1 to 5, with half-points awarded, and 5 being the best)
Rating_business_outlook (categorical): Reviewer’s rating of the company’s business outlook (positive, negative, neutral)
Rating_ceo (categorical): Reviewer’s approval rating of the company’s CEO (approve, disapprove, no opinion)
Rating_recommend_to_friend (categorical): Indicates whether the reviewer would recommend the company to a friend (positive, negative)
Review_summary (string): Title of review
Review_pros (string): Reviewer’s positive comments about the company
Review_cons (string): Reviewer’s negative comments about the company
Review_count_helpful (integer): Number of users who found the review helpful
Review_count_not_helpful (integer): Number of users who found the review unhelpful
Ultimate_parent_rcid (categorical): Revelio Labs company ID for the parent company
Ultimate_parent_company_name (categorical): Name of the parent company
Sentiment Scores¶
This dataset contains employee sentiment scores that were generated using our sentiment model. This model uses Natural Language Processing to capture employee sentiment on specific topics such as management and diversity. For each review, we compute a weighted sentiment score based on how relevant a given topic was for the positive or negative portion of the review, assigning a positive (negative) score to topics that had an overall positive (negative) impact on the review. These scores are then aggregated to arrive at a company-wide sentiment score. Each row contains the sentiment scores for a given company.
Rcid (categorical): Revelio Labs company ID
Company (categorical): Company name
Management_sentiment (float): Management sentiment score
Innovative_technology_sentiment (float): Innovative technology sentiment score
Work_life_balance_sentiment (float): Work life balance sentiment score
Mentorship_sentiment (float): Mentorship sentiment score
Career_advancement_sentiment (float): Career advancement sentiment score
Diversity_and_inclusion_sentiment (float): Diversity and inclusion sentiment score
Coworkers_sentiment (float): Coworkers sentiment score
Compensation_sentiment (float): Compensation sentiment score
Culture_sentiment (float): Culture sentiment score
Company_and_division_size_sentiment (float): Company and division size sentiment score
Perks_and_benefits_sentiment (float): Perks and benefits sentiment score
Onboarding_sentiment (float): Onboarding sentiment score
Remote_work_sentiment (float): Remote work sentiment score
Num_reviews (integer): Number of reviews factored into the scores
Sentiment Trends¶
This dataset contains employee review data aggregated to the level of company, region, position, and month. Each row contains aggregated ratings for this level of granularity. All ratings span from 1 to 5, with 5 being the highest rating.
Rcid (categorical): Revelio Labs company ID
Company (categorical): Company name
Region (categorical): Region of the company (ex. Northern Europe, South-eastern Asia)
Job_category (categorical): Aggregated position role with 7 discrete levels (also available at other levels of aggregation)
Month (categorical): Month and year of the aggregated reviews, provided in “YYYY-MM” format
Rating_overall (float): Aggregated overall rating
Rating_career_opportunities (float): Aggregated career opportunities rating
Rating_compensation_and_benefits (float): Aggregated employee compensation and benefits rating
Rating_culture_and_values (float): Aggregated company culture and values rating
Rating_diversity_and_inclusion (float): Aggregated company diversity and inclusion rating
Rating_senior_leadership (float): Aggregated senior management rating
Rating_work_life_balance (float): Aggregated work-life balance rating
Rating_ceo (float): Aggregated CEO approval rating
Rating_recommend_to_friend (float): Aggregated “recommend to a friend” rating
Rating_business_outlook (float): Aggregated business outlook rating
Layoff Notices¶
Download a sample of our Layoff Notices data here.
We collect WARN layoff data, which details whenever a firm is planning to lay off a significant portion of its workforce. The WARN Act (Worker Adjustment and Retraining Notification) ensures that mass layoffs and plant closures are registered with states and the Department of Labor in advance to allow for the provision of compliance assistance materials to help workers and employers understand their rights and responsibilities.
We provide the WARN data at the notice level, where each row represents a layoff notice.
Rcid (categorical): Revelio Labs company ID
Company (categorical): Name of company registering layoff (Revelio Labs mapped)
State (categorical): State where layoff is occurring (Revelio Labs mapped)
City (categorical): City where layoff is occurring (Revelio Labs mapped)
Metro_area (categorical): Metro area where layoff is occurring (Revelio Labs mapped). Layoffs may be assigned to a country or state’s “non-metropolitan area” if we do not have enough information to assign them to a specific metro area in that geography.
Notice_date (categorical): Date of layoff notice
Layoff_date (categorical): Date as of which layoffs will be effective
Layoff_type (categorical): Type of layoff (permanent, temporary, etc.)
Num_employees (integer): Number of employees to be laid off
Ultimate_parent_rcid (categorical): Revelio Labs company ID for the parent company
Ultimate_parent_company_name (categorical): Name of the parent company
Individual Level Data¶
Revelio Labs also provides data on individual professional profiles. These files contain user-level information on current and historical positions, educational history, name, and demographic information.
Download a sample of our Individual data here.
Position File¶
This file contains the individual level position data. Each row is a position held by an individual.
Position_id (categorical): Revelio Labs position ID
User_id (categorical): Revelio Labs user ID
Company_raw (categorical): Company name (raw from online profile)
Company_linkedin_url (categorical): URL for employer (from online profile)
Company_cleaned (categorical): Company name (from online profile, cleaned of special characters)
Location_raw (categorical): location of position (raw from online profile)
Region (categorical): Region of position (Ex. Southern Asia, Western Europe)
Country (categorical): Country of position (imputed from location)
State (categorical): State of position (if missing, we infer it from the user’s current state)
Metro_area (categorical): Metropolitan area of position (if missing, we infer it from the user’s current location). Positions may be assigned to a country or state’s “non-metropolitan area” if we do not have enough information to assign them to a specific metro area in that geography.
Startdate (categorical): Position start date if reported, null otherwise
Enddate (categorical): Position end date if reported, null otherwise
Title_raw (categorical): Position title (raw from online profile)
Role_k1500 (categorical): Aggregated position role with 1500 discrete levels (also available at other levels of aggregation)
Job_category (categorical): Aggregated position role with 7 discrete levels
Seniority (ordinal): Seniority level with 7 discrete levels
Salary (float): Modeled annual salary for the position (in USD)
Position_number (integer): Chronological order of a position in a user’s profile
Rcid (categorical): Revelio Labs company ID
Company_name (categorical): Company name (mapped)
Ultimate_parent_rcid (categorical): Revelio Labs company ID for the parent company
Ultimate_parent_company_name (categorical): Name of the parent company
User File¶
This file contains the individual level user data. Each row is an individual’s public profile.
User_id (categorical): Revelio Labs user ID
Firstname (categorical): First name (parsed from fullname)
Lastname (categorical): Last name (parsed from fullname)
Fullname (categorical): Name reported on online profile
F_prob (float): Probability of user being female
M_prob (float): Probability of user being male
Api_prob (float): Probability of user being Asian/Pacific Islander
Black_prob (float): Probability of user being Black or African American
Hispanic_prob (float): Probability of user being Hispanic or Latino
Multiple_prob (float): Probability of user being two or more races
Native_prob (float): Probability of user being American Indian or Alaskan Native
White_prob (float): Probability of user being Non-Hispanic White
Education File¶
This file contains the individual level education data. Each row is an educational record.
User_id (categorical): Revelio Labs user ID
University_raw (categorical): School name (raw from online profile)
Startdate (categorical): Start date
Enddate (categorical): End date
Degree_raw (categorical): Degree title (raw from online profile)
Field_raw (categorical): Degree field (raw from online profile)
Skill File¶
This file contains the individual level skills data. Revelio Labs uses proprietary algorithms to cluster the skill universe into distinct clusters of skills. The clustering can be as coarse as 25 groups and as fine as over 20,000 groups. The default skill clustering is done at 50 groups.
User_id (categorical): Revelio Labs user ID
Skill (categorical): Single skill from profile (raw from online profile)
Skill_mapped (categorical): Skill from profile (Revelio Labs mapped)
Skill_k75 (categorical): Aggregated skill with 75 discrete levels (also available at other levels of aggregation)
Company Reference¶
This file contains information on companies that are covered by the delivered data and is included with each delivery.
Download a sample of our Company Reference file here.
Rcid (categorical): Revelio Labs company ID
Company (categorical): Company name
Factset_entity_id (categorical): FactSet company ID
Year_founded (categorical): Year in which the company was founded
Ticker (categorical): Ticker of the company
Exchange_name (categorical): The stock exchange that the company is listed on
Sedol (categorical): SEDOL code
Isin (categorical): ISIN code
Cusip (categorical): CUSIP number
Url (categorical): Company’s website URL
Naics_code (categorical): Company’s NAICS industry code
Cik (categorical): CIK number
Lei (categorical): LEI code
Linkedin_url (categorical): Company LinkedIn URL
Child_rcid (categorical): Revelio Labs company ID of largest subsidiary company
Child_company (categorical): Company name of largest subsidiary company
Child_linkedin_url (categorical): Company LinkedIn URL of largest subsidiary company
Ultimate_parent_rcid (categorical): Revelio Labs company ID of ultimate parent company
Ultimate_parent_rcid_name (categorical): Company name of ultimate parent company