Datasets

Workforce Dynamics

Download a sample of our Workforce Dynamics files here.

This dataset contains aggregated workforce statistics. Every row is a distinct level of aggregation and month combination. Generally, the broadest configuration of this dataset is the company and month level. In that case, every row observes a particular company in a given month. If we include country as a level of aggregation, then each row of the dataset would correspond to a company, country, and month combination. The dataset at the company-country-month level can be aggregated to create the company-month dataset.

Let’s take a look at an example output where we have the levels of aggregation as company, country tracked across month and let count be the outcome of interest that represents the total headcounts for that particular level of aggregation, month combination (the count represents the headcount at the end of that particular month):

company

country

month

count

Company A

U.S.

2021-01

10

Company A

U.S.

2021-02

12

Company A

U.S.

2021-03

14

Company A

Canada

2021-01

10

Company A

Canada

2021-02

11

Company A

Canada

2021-03

9

This enables us to visualize the table as a graph as well, where the month can be represented along the X-axis, and the outcome count can be represented along the Y-axis. Thus, in this case (Company A, U.S.) and (Company A, Canada) can be viewed as entities for which the outcome count is tracked over time (month) on this graph.

Note that it’s easy to compute a broader level of aggregation from a narrower level of aggregation. To reduce our previous example to the company and month level, we can sum across the country column to get:

company

month

count

Company A

2021-01

20 (10+10)

Company A

2021-02

23 (12+11)

Company A

2021-03

23 (14+9)

Levels of Aggregation

We can construct the Workforce Dynamics file across different levels of aggregation, including combinations of the following:

  • Company (categorical): Revelio Labs’ delivery file can provide insights on more than 20 million companies globally. By default, all subsidiaries of a the company are included.

  • Rcid (categorical): Revelio Labs company ID

  • Region (categorical): Our broadest geographical level of aggregation is region. We classify locations into 15 distinct geographical regions:

    • Northern America

    • Central America

    • Southern America

    • Northern Europe

    • Southern Europe

    • Eastern Europe

    • Western Europe

    • Southern Asia

    • South-Eastern Asia

    • Eastern Asia

    • Central and Western Asia

    • Pacific Islands

    • Arab States

    • Northern Africa

    • Sub-Saharan Africa

  • Country (categorical): The granularity also can be specified at the country level for 247 distinct countries.

  • State (categorical): The granularity can be specified at the state level, including international locations.

  • Metro_area (categorical): Our most narrow level of aggregation for geography is metro area. Employees may be included under a country or state’s “non-metropolitan area” if we do not have enough information to assign them to a specific metro area in that geography.

  • Job Category (categorical): In addition to aggregating by geography, we can also aggregate by occupation or role. Our broadest role classification groups positions into the following 7 job categories:

    • Admin

    • Engineer

    • Finance

    • Marketing

    • Operations

    • Sales

    • Scientist

    The job role taxonomy is developed by our proprietary representation and clustering algorithms. We develop mathematical representations of each job title using the title itself, the text description of the position (from either individuals describing their own experiences or employers on a job posting), individuals’ skills, associates, and previous experience. Our clustering algorithm is in the family of hierarchical/agglomerative clustering algorithms. This means that we begin with every job title occupying its own cluster, then iteratively combine clusters based on a set of criteria. This allows for complete flexibility of the number of clusters. We update this taxonomy periodically to adjust to the changing occupational landscape. Please see our Methodology section for more details on our job taxonomy.

  • Role_kn (categorical): Aggregated position role with n discrete levels. We can provide roles at several levels of aggregation, including the following: role_k50, role_k150, role_k300, role_k500, role_k1500. For Workforce Dynamics, the most granular role classification we recommend is role_k150.

  • Seniority (ordinal): Seniority ranges from 1 to 7. 1 is the most junior, and 7 is the most senior (see the Methodology section for more details). Our seniority model predicts seniority based on the title, company, industry, age, previous seniority, and position history.

  • Gender (categorical): Gender is calculated as a probability based on the likelihood of the first name being male or female.

  • Ethnicity (categorical): Ethnicity is estimated based on the likelihood of both the first and last name as well as an individual’s location.

  • Month (categorical): The month and year of the position are provided in “YYYY-MM” format. Each Workforce Dynamics file contains monthly data up to the previous month’s end.

Outcomes

We can include the following outcomes as columns in the Workforce Dynamics file:

  • Count (float): The total number of employees for a specific level of aggregation for each month. Please note that these counts can be decimals (see our FAQ for more details).

  • Inflow/Outflow (float): The total inflow and outflow counts of employees at each level of aggregation for a given month

  • External Inflow/Outflow (float): Total inflow and outflow counts of employees at each level of aggregation for a given month, excluding internal movements within a company

  • Salary (float): Sum of estimated annual salaries of employees at each level of aggregation in a given month, in USD. We predict the salary for each position based on role, seniority, company, and country using a regression-based model. We train this model using over 200 million salaries from job postings and publicly available labor certification applications, and use country-level inflation rates to estimate the change in salary over time. We get an out-of-sample root mean squared error (RMSE) of 14%. The Salary column in Workforce Dynamics is the sum of salaries at a specific level of aggregation; please divide by the Count column to get the average salary of employees in that level.

  • Total_prestige (float): We can predict the average prestige level of employees at each level of aggregation in a given month. We calculate the prestige score of each position using world university rankings to set prior values for our base model, with information then being redistributed among all positions according to the changing networks created by worker inflows and outflows. The Total_prestige is the numerator of our prestige score.

  • Prestige_weight (float): Denominator of our prestige score. To calculate average prestige for a certain level of aggregation, please divide Total_prestige by Prestige_weight.

  • Duration (float): The average tenure of employees in the specified level of aggregation in years.

Please see the FAQ section for more information on outcomes and levels of aggregation in our Workforce Dynamics files.

Skill Dynamics

We can also provide a version of the Workforce Dynamics file with skills as a level of aggregation. The skill categories that can be included are:

  • Skill_k25

  • Skill_k50

  • Skill_k75

More information on these skill categories, and our Skills Taxonomy in general, is available in our Methodology section.

Individual employees (users) in our data are associated with sets of skills. The counts for each skill_k category in the Skill Dynamics file represent the number of distinct employees who have skills in that category, who are included in a specified level of aggregation each month. The inflow and outflow columns represent the number of employees with skills in each category who have entered or exited each level of aggregation each month.

Please note that as employees can have multiple skills, or may not report skills at all, the counts in the Skill Dynamics file may be different than the headcounts in the Workforce Dynamics file. This is especially true when aggregating across skill_k categories, as employees may be counted in more than one skill_k category.

Transitions

This dataset contains information on transitions into and out of a set of base companies.

The data consists of two files: Inflows and Outflows. Each row provides data on an individual transition, including the previous and new roles, location, seniority, and salary of individuals leaving or joining the company. The base company in the Inflows file is denoted by the ‘new’ prefix, while the base company in the Outflows file is denoted by ‘prev’.

Download a sample of our Transitions files here.

  • User_id (categorical): Revelio Labs user ID

  • Prev_rcid (categorical): Revelio Labs company ID of previous company

  • Prev_position_id (categorical): Previous position ID

  • Prev_company (categorical): Previous company name

  • Prev_seniority (ordinal): Previous seniority level with 7 discrete levels

  • Prev_region (categorical): Previous region

  • Prev_country (categorical): Previous country

  • Prev_state (categorical): Previous state

  • Prev_metro_area (categorical): Previous metropolitan area

  • Prev_jobtitle (categorical): Previous job title

  • Prev_job_category (categorical): Aggregated previous position role with 7 discrete levels

  • Prev_role_k50 (categorical): Aggregated previous position role with 50 discrete levels

  • Prev_role_k150 (categorical): Aggregated previous position role with 150 discrete levels

  • Prev_enddate (time): End date of previous position

  • Prev_salary (float): Estimated annual salary of the previous role (in USD)

  • New_position_id (categorical): New position ID

  • New_rcid (categorical): Revelio Labs company ID of new company

  • New_company (categorical): New company name

  • New_seniority (ordinal): New seniority level with 7 discrete levels

  • New_region (categorical): New region

  • New_country (categorical): New country

  • New_state (categorical): New state

  • New_metro_area (categorical): New metropolitan area

  • New_jobtitle (categorical): New job title

  • New_job_category (categorical): Aggregated new position role with 7 discrete levels

  • New_role_k50 (categorical): Aggregated new position role with 50 discrete levels

  • New_role_k150 (categorical): Aggregated new position role with 150 discrete levels

  • New_startdate (time): Start date of new position

  • New_salary (float): Estimated annual salary of the new role (in USD)

Job Postings

Revelio Labs provides job postings data in two formats: Job Posting Dynamics (an aggregated time series of monthly job posting statistics), and Individual Job Postings. Our job postings data comes from several sources, including job posting aggregator sites and company websites. We can provide job postings either via COSMOS, our unified job posting dataset which has been standardized and deduplicated across our different postings sources, or separately by source. The coverage of the data is global.

Job Posting Dynamics

This dataset contains aggregated job posting statistics. Every row is a distinct level of aggregation and month combination. Generally, the broadest configuration of this dataset is the company and month level. Each row would correspond to a company and month combination. For more information on the levels of aggregation, please refer to the Workforce Dynamics section.

Download a sample of our Job Posting Dynamics data here.

  • Rcid (categorical): Revelio Labs company ID

  • Company (categorical): Company name

  • Country (categorical): Country location of job posting

  • State (categorical): State location of job posting

  • Job_category (categorical): Aggregated posting role with 7 discrete levels

  • Role_k50 (categorical): Aggregated posting role with 50 discrete levels

  • Role_k150 (categorical): Aggregated posting role with 150 discrete levels

  • Month (categorical): The month and year provided in “YYYY-MM” format

  • Active_posting (float): Number of active postings during that month

  • New_posting (float): Number of new postings during that month

  • Removed_posting (float): Number of postings removed during that month:

  • Active_salary_avg (float): Average salary for active postings during that month

  • New_salary_avg (float): Average salary for new postings during that month

  • Removed_salary_avg (float): Average salary for postings that got removed during that month

  • Filling_time_avg (float): Average time to fill, in months

  • Expected_hires (float): The total number of hires expected for active postings in each level of aggregation and month (COSMOS only)

Individual Job Postings

Revelio Labs also provides data on individual job postings. These files contain posting-level information on current and historical job postings such as posting date, location, role, and salary.

Download a sample of our COSMOS Individual Job Postings data here.

  • Job_id (categorical): Posting key

  • Rcid (categorical): Revelio Labs company ID

  • Company (categorical): Company name

  • Rics_k50 (categorical): Industry of employer with 50 discrete categories (Revelio Labs mapped)

  • Rics_k200 (categorical): Industry of employer with 200 discrete categories (Revelio Labs mapped)

  • Rics_k400 (categorical): Industry of employer with 400 discrete categories (Revelio Labs mapped)

  • Title_raw (categorical): Position title (raw from posting)

  • Title_translated (categorical): Raw position title translated to English

  • Job_category (categorical): Aggregated position role with 7 discrete levels

  • Role_k50 (categorical): Aggregated position role with 50 discrete levels

  • Role_k150 (categorical): Aggregated position role with 150 discrete levels

  • Role_k1500 (categorical): Aggregated position role with 1500 discrete levels

  • State, country (categorical): Listed location for posting

  • Salary (float): Predicted salary for posting

  • Post_date (categorical): Date at which the job was posted

  • Remove_date (categorical): Date at which the job was removed. If null, it hasn’t been removed yet.

  • Ultimate_parent_rcid (categorical): Revelio Labs company ID for the parent company

  • Ultimate_parent_company_name (categorical): Name of the parent company

  • Remote_type (categorical): Type of remote work a job posting offers. If not specified, the job is categorized as “Fully in Office.”

  • Expected_hires (float): The expected number of hires for each job posting (COSMOS only)

  • Source_* (boolean): Indicator for whether a job posting was found in each data source (e.g. company websites, LinkedIn, Indeed, etc.) (COSMOS only)

Sentiment

Download a sample of our Sentiment data here.

Individual Reviews

Revelio Labs provides company review data with the following information. Note that not all rating fields are required to be filled out by the reviewer. Also, some ratings (ie., ‘culture and values’ and ‘diversity and inclusion’) were added more recently.

  • Rcid (categorical): Revelio Labs company ID

  • Company (categorical): Company name

  • Review_id (categorical): Review ID

  • Title_raw (categorical): Reviewer’s raw position title

  • Location_raw (categorical): Reviewer’s raw location

  • Region (categorical): Reviewer’s region

  • Country (categorical): Reviewer’s country

  • State (categorical): Reviewer’s state

  • Metro_area (categorical): Reviewer’s metropolitan area. Reviews may be assigned to a country or state’s “non-metropolitan area” if we do not have enough information to assign them to a specific metro area in that geography.

  • Review_language_id (categorical): Language of the review

  • Review_date_time (time): Posting date of the review

  • Review_iscovid19 (boolean): Indicates whether review mentions the Covid-19 pandemic

  • Reviewer_current_job (boolean): Indicates whether the reviewer is a current or former employee

  • Reviewer_employment_status (categorical): Reviewer’s employment type (freelance, part time, intern, contract, regular)

  • Reviewer_job_ending_year (integer): Final year of the reviewer’s employment with the company

  • Reviewer_length_of_employment (integer): Number of years that the reviewer worked at the company

  • Rating_overall (integer): Reviewer’s overall rating of the company (integer values from 1 to 5, with 5 being the best)

  • Rating_career_opportunities (float): Reviewer’s rating of the company’s career opportunities (from 1 to 5, with half-points awarded, and 5 being the best)

  • Rating_compensation_and_benefits (float): Reviewer’s rating of the company’s compensation and benefits (from 1 to 5, with half-points awarded, and 5 being the best)

  • Rating_culture_and_values (integer): Reviewer’s rating of the company’s culture and values (integer values from 1 to 5, with 5 being the best)

  • Rating_diversity_and_inclusion (integer): Reviewer’s rating of the company’s diversity and inclusion (integer values from 1 to 5, with 5 being the best)

  • Rating_senior_leadership (float): Reviewer’s rating of the company’s senior management (from 1 to 5, with half-points awarded, and 5 being the best)

  • Rating_work_life_balance (float): Reviewer’s rating of the company’s work-life balance (from 1 to 5, with half-points awarded, and 5 being the best)

  • Rating_business_outlook (categorical): Reviewer’s rating of the company’s business outlook (positive, negative, neutral)

  • Rating_ceo (categorical): Reviewer’s approval rating of the company’s CEO (approve, disapprove, no opinion)

  • Rating_recommend_to_friend (categorical): Indicates whether the reviewer would recommend the company to a friend (positive, negative)

  • Review_summary (string): Title of review

  • Review_pros (string): Reviewer’s positive comments about the company

  • Review_cons (string): Reviewer’s negative comments about the company

  • Review_count_helpful (integer): Number of users who found the review helpful

  • Review_count_not_helpful (integer): Number of users who found the review unhelpful

  • Ultimate_parent_rcid (categorical): Revelio Labs company ID for the parent company

  • Ultimate_parent_company_name (categorical): Name of the parent company

Sentiment Scores

This dataset contains employee sentiment scores that were generated using our sentiment model. This model uses Natural Language Processing to capture employee sentiment on specific topics such as management and diversity. For each review, we compute a weighted sentiment score based on how relevant a given topic was for the positive or negative portion of the review, assigning a positive (negative) score to topics that had an overall positive (negative) impact on the review. These scores are then aggregated to arrive at a company-wide sentiment score. Each row contains the sentiment scores for a given company.

  • Rcid (categorical): Revelio Labs company ID

  • Company (categorical): Company name

  • Management_sentiment (float): Management sentiment score

  • Innovative_technology_sentiment (float): Innovative technology sentiment score

  • Work_life_balance_sentiment (float): Work life balance sentiment score

  • Mentorship_sentiment (float): Mentorship sentiment score

  • Career_advancement_sentiment (float): Career advancement sentiment score

  • Diversity_and_inclusion_sentiment (float): Diversity and inclusion sentiment score

  • Coworkers_sentiment (float): Coworkers sentiment score

  • Compensation_sentiment (float): Compensation sentiment score

  • Culture_sentiment (float): Culture sentiment score

  • Company_and_division_size_sentiment (float): Company and division size sentiment score

  • Perks_and_benefits_sentiment (float): Perks and benefits sentiment score

  • Onboarding_sentiment (float): Onboarding sentiment score

  • Remote_work_sentiment (float): Remote work sentiment score

  • Num_reviews (integer): Number of reviews factored into the scores

Layoff Notices

Download a sample of our Layoff Notices data here.

We collect WARN layoff data, which details whenever a firm is planning to lay off a significant portion of its workforce. The WARN Act (Worker Adjustment and Retraining Notification) ensures that mass layoffs and plant closures are registered with states and the Department of Labor in advance to allow for the provision of compliance assistance materials to help workers and employers understand their rights and responsibilities.

We provide the WARN data at the notice level, where each row represents a layoff notice.

  • Rcid (categorical): Revelio Labs company ID

  • Company (categorical): Name of company registering layoff (Revelio Labs mapped)

  • State (categorical): State where layoff is occurring (Revelio Labs mapped)

  • City (categorical): City where layoff is occurring (Revelio Labs mapped)

  • Metro_area (categorical): Metro area where layoff is occurring (Revelio Labs mapped). Layoffs may be assigned to a country or state’s “non-metropolitan area” if we do not have enough information to assign them to a specific metro area in that geography.

  • Notice_date (categorical): Date of layoff notice

  • Layoff_date (categorical): Date as of which layoffs will be effective

  • Layoff_type (categorical): Type of layoff (permanent, temporary, etc.)

  • Num_employees (integer): Number of employees to be laid off

  • Ultimate_parent_rcid (categorical): Revelio Labs company ID for the parent company

  • Ultimate_parent_company_name (categorical): Name of the parent company

Individual Level Data

Revelio Labs also provides data on individual professional profiles. These files contain user-level information on current and historical positions, educational history, name, and demographic information.

Download a sample of our Individual data here.

Position File

This file contains the individual level position data. Each row is a position held by an individual.

  • Position_id (categorical): Revelio Labs position ID

  • User_id (categorical): Revelio Labs user ID

  • Company_raw (categorical): Company name (raw from online profile)

  • Company_linkedin_url (categorical): URL for employer (from online profile)

  • Company_cleaned (categorical): Company name (from online profile, cleaned of special characters)

  • Location_raw (categorical): location of position (raw from online profile)

  • Region (categorical): Region of position (Ex. Southern Asia, Western Europe)

  • Country (categorical): Country of position (imputed from location)

  • State (categorical): State of position (if missing, we infer it from the user’s current state)

  • Metro_area (categorical): Metropolitan area of position (if missing, we infer it from the user’s current location). Positions may be assigned to a country or state’s “non-metropolitan area” if we do not have enough information to assign them to a specific metro area in that geography.

  • Startdate (categorical): Position start date if reported, null otherwise

  • Enddate (categorical): Position end date if reported, null otherwise

  • Title_raw (categorical): Position title (raw from online profile)

  • Role_k1500 (categorical): Aggregated position role with 1500 discrete levels (also available at other levels of aggregation)

  • Job_category (categorical): Aggregated position role with 7 discrete levels

  • Seniority (ordinal): Seniority level with 7 discrete levels

  • Salary (float): Modeled annual salary for the position (in USD)

  • Position_number (integer): Chronological order of a position in a user’s profile

  • Rcid (categorical): Revelio Labs company ID

  • Company_name (categorical): Company name (mapped)

  • Ultimate_parent_rcid (categorical): Revelio Labs company ID for the parent company

  • Ultimate_parent_company_name (categorical): Name of the parent company

User File

This file contains the individual level user data. Each row is an individual’s public profile.

  • User_id (categorical): Revelio Labs user ID

  • Firstname (categorical): First name (parsed from fullname)

  • Lastname (categorical): Last name (parsed from fullname)

  • Fullname (categorical): Name reported on online profile

  • F_prob (float): Probability of user being female

  • M_prob (float): Probability of user being male

  • Api_prob (float): Probability of user being Asian/Pacific Islander

  • Black_prob (float): Probability of user being Black or African American

  • Hispanic_prob (float): Probability of user being Hispanic or Latino

  • Multiple_prob (float): Probability of user being two or more races

  • Native_prob (float): Probability of user being American Indian or Alaskan Native

  • White_prob (float): Probability of user being Non-Hispanic White

Education File

This file contains the individual level education data. Each row is an educational record.

  • User_id (categorical): Revelio Labs user ID

  • University_raw (categorical): School name (raw from online profile)

  • Startdate (categorical): Start date

  • Enddate (categorical): End date

  • Degree_raw (categorical): Degree title (raw from online profile)

  • Field_raw (categorical): Degree field (raw from online profile)

Skill File

This file contains the individual level skills data. Revelio Labs uses proprietary algorithms to cluster the skill universe into distinct clusters of skills. The clustering can be as coarse as 25 groups and as fine as over 20,000 groups. The default skill clustering is done at 50 groups.

  • User_id (categorical): Revelio Labs user ID

  • Skill (categorical): Single skill from profile (raw from online profile)

  • Skill_mapped (categorical): Skill from profile (Revelio Labs mapped)

  • Skill_k75 (categorical): Aggregated skill with 75 discrete levels (also available at other levels of aggregation)

Company Reference

This file contains information on companies that are covered by the delivered data and is included with each delivery.

Download a sample of our Company Reference file here.

  • Rcid (categorical): Revelio Labs company ID

  • Company (categorical): Company name

  • Factset_entity_id (categorical): FactSet company ID

  • Year_founded (categorical): Year in which the company was founded

  • Ticker (categorical): Ticker of the company

  • Exchange_name (categorical): The stock exchange that the company is listed on

  • Sedol (categorical): SEDOL code

  • Isin (categorical): ISIN code

  • Cusip (categorical): CUSIP number

  • Url (categorical): Company’s website URL

  • Naics_code (categorical): Company’s NAICS industry code

  • Cik (categorical): CIK number

  • Lei (categorical): LEI code

  • Linkedin_url (categorical): Company LinkedIn URL

  • Child_rcid (categorical): Revelio Labs company ID of largest subsidiary company

  • Child_company (categorical): Company name of largest subsidiary company

  • Child_linkedin_url (categorical): Company LinkedIn URL of largest subsidiary company

  • Ultimate_parent_rcid (categorical): Revelio Labs company ID of ultimate parent company

  • Ultimate_parent_rcid_name (categorical): Company name of ultimate parent company