Annex C. Methodological notes

Comparability of results across different countries is achieved by reproducing the same cleaning procedures and estimation techniques as used on the raw data. The statistical package produced by the OECD includes the following data cleaning instructions, applied to all country source data:

  • Firms in sectors (1-digit NACE 21) A, B, O, T, U are dropped.

  • All negative values of variables that cannot plausibly be negative (employment, turnover, export revenues, labour costs, fixed costs, operating costs, material costs, depreciation, interest costs, research and development (R&D) expenditures, intangible fixed assets, tangible fixed assets, fixed assets, current assets, total assets, loans) are considered missing.

  • Observations with rare values of the variable year are dropped (e.g. observations that refer to the future).

  • The birth year value is considered missing if it is a later year than the latest year covered by the dataset.

  • The accounting indicators based on ratios (e.g. value-added over turnover, intangible assets over tangible assets) are winsorized at 1% level (i.e. values below the first and above the 99th percentile of the distribution are replaced with the value of the 2nd and 98th percentile respectively).

Firms are classified within the industry-standard classification system (NACE 2 rev.2 sector classification). The regression analyses rely on the 4-digit classification. Sectoral analysis aggregates the NACE sections into six groups depending on the intensity of high-technology and knowledge (Table A C.1).

The job functions that are used to approximate innovative activity in firms are classified depending on the keywords in the job function description. Types of classifications can vary across countries. For Finland and Portugal, job functions are classified using the International Standard Classification of Occupations (ISCO-08) classification.

The ISCO-08 classification provides a description of job functions with four different levels of detail for each job: major groups, sub-major groups, minor groups and unit groups. Each level includes a list of tasks typically required for the job.

The jobs that fall under the group “HR job functions” need to mention one or more of the following keywords: “human resource”, “career”, “training” and “staff development”.

These keywords correspond to ISCO occupation unit groups as follows:

  • 1212 Human Resource Managers.

  • 2423 Personnel and Careers Professionals.

  • 2424 Training and Staff Development Professionals.

  • 4416 Personnel Clerks.

Management can be tracked by identification of job function groups, which have “management”, “organisation” or “planning” in their title.

Management is represented by the following ISCO occupations:

  • 1213 Policy and Planning Managers.

  • 2421 Management and Organisation Analyst.

  • 2422 Management Policy Specialist.

The digital technology job functions are all jobs that mention “information technology”, “multimedia”, “software”, “programmers”, “database”, “network” and “system” in their title or job functions.

These generate the following list of ISCO occupations:

  • 1330 Information and Communications Technology Services Managers.

  • 2356 Information Technology Trainers.

  • Sub-major group 25:

    • 251 Software and Applications Developers and Analysts (2511 System Analysts, 5512 Software Developers, 2513 Web and Multimedia Developers, 2514 Applications Programmers, 2519 Software and Application Developers and Analysts).

    • 252 Database and Network Professionals (2521 Database Designers and Administrators, 2522 Systems Administrators, 2523 Computer Network Professionals, 2529 Database and Network Professionals).

    • Minor group 351 Information and Communications Technology Operations and User Support Technicians (3511 Information and Communications Technology Operations Technicians, 3512 Information and Communications Technology User Support Technicians, 3513 Computer Network and Systems Technicians, 3514 Web Technicians).

Research job functions are the jobs that mention “research” in the title of their profession (excluding 4227 Survey and Market Research Interviewers) and as well as “research” among the first tasks in the description of the occupation.

These are the following job functions by ISCO-08 classification:

  • 1223 Research and Development Managers.

  • 21 Science and Engineering Professionals.

  • 2310 University and Higher Education Teachers.

  • 2351 Education Methods Specialists.

  • 2631 Economists.

  • 2632 Sociologists, Anthropologists and Related Professionals.

  • 2633 Philosophers, Historians and Political Scientists.

  • 2634 Psychologists.

Marketing job functions typically have “marketing”, “advertising” or “public relations” in their job title. This includes the following jobs:

  • 1221 Sales and Marketing Managers.

  • 1222 Advertising and Public Relations Managers.

  • 2431 Advertising and Marketing Specialist.

  • 2432 Public Relations Specialist.

Each employee in the administrative dataset is assigned a value of one if their job belongs to one of the five groups of job functions, zero otherwise. The employee dataset is then aggregated on the firm level, such that it contains the firm identifier, year (if applicable) and a number of contracts/persons that worked in each of the five innovative job groups. Such an aggregated employment dataset is linked to balance sheet data for scaling-up analysis.

For other types of classification, cross-validation of identified job categories needs to ensure that analysis considers only relevant employees. Labelling of the job functions involves a correct and sensible translation of the keywords used in the data. Besides identifying the desired groups of job functions, other jobs with the same keywords may appear in a selection without characterising the targeted job category. For example, “management” can appear in titles or descriptions of non-managerial job functions.

The educational attainment of employees is categorised according to the European Qualifications Framework (EQF) classification. Employees are grouped in categories according to their highest-achieved education level as follows: i) less than high school education; ii) high school diploma; iii) undergraduate degree; and iv) at least a graduate university degree. In the case of countries where the education classification is different, the closest possible classification to the four categories is applied.

Employees are classified by the skill content of their occupations. Following the International Standard Classification of Occupations (ISCO-08), occupations are classified as high-skilled, medium-skilled and low-skilled (ILO, 2012[1]). Occupation refers to the kind of work performed in a job. The concept of occupation is defined as a “set of jobs whose main tasks and duties are characterised by a high degree of similarity”. Skill is defined as the ability to carry out the tasks and duties of a given job and is a function of the complexity and range of tasks and duties to be performed in an occupation. Skill level is measured operationally by considering one or more of:

  • The nature of the work performed in an occupation in relation to the characteristic tasks and duties defined for each ISCO-08 skill level.

  • The level of formal education defined in terms of the International Standard Classification of Education (ISCED) (UNESCO, 1997[2]) required for competent performance of the tasks and duties involved.

  • The amount of informal on-the-job training and/or previous experience in a related occupation required for competent performance of these tasks and duties.

The wage premium and gender wage gap of high-growth firms compared to non-high-growth firms are based on differences in residual wages. The estimation procedure follows three steps:

  1. 1. A Mincer wage regression is estimated, where log hourly wages are regressed on a set of worker characteristics, namely tenure and tenure2, age and age2, education, occupation and sector, using data from all workers in all firms. Based on the estimated coefficients from this regression, wage residuals for each individual are estimated by subtracting estimated from actual log wages. This provides estimates of the wage component of individual workers that is not explained by individual characteristics. This procedure controls for potential differences in the workforce composition with respect to individual characteristics across genders or scalers and non-scalers.

  2. 2. Two types of mean residual wages are computed for each individual firm. The first type assesses the average wage premium and is computed for all workers in a firm. The second type is computed separately for all male and female workers in a firm to assess the gender wage gap. The average gender wage gap in a firm is then computed by calculating the difference between the average residual earnings of male and female workers in a given firm.

  3. 3. Average residual wages and average residual gender wage gaps compare the wage differences between scalers and non-scalers.


[1] ILO (2012), “International Standard Classification of Occupations 2008 (ISCO-08): Structure, group definitions and correspondence tables”, (accessed on 9 June 2021).

[2] UNESCO (1997), International Standard Classification of Education - ISCED 1997,


← 1. Nomenclature of Economic Activities in the European stastical classification of economic activities.

Metadata, Legal and Rights

This document, as well as any data and map included herein, are without prejudice to the status of or sovereignty over any territory, to the delimitation of international frontiers and boundaries and to the name of any territory, city or area. Extracts from publications may be subject to additional disclaimers, which are set out in the complete version of the publication, available at the link provided.

© OECD 2021

The use of this work, whether digital or print, is governed by the Terms and Conditions to be found at