copy the linklink copied!2. How are science, technology and innovation going digital? The statistical evidence

Fernando Galindo-Rueda

Directorate for Science, Technology and Innovation, OECD

Chapter 2 examines the digitalisation of science and innovation drawing on statistical measurement and analysis by the OECD’s Working Party of National Experts on Science and Technology Indicators, including material featured in the OECD report Measuring the Digital Transformation. This chapter maps the ICT specialisation of research and the growth of scientific production and government funding of research related to artificial intelligence. It examines the multidimensional nature of the digital transformation of science. This chapter also shows how innovation in firms can be linked to the adoption of digital technologies and business practices. It concludes by summarising possible next steps for OECD’s own measurement agenda.


The statistical data for Israel are supplied by and under the responsibility of the relevant Israeli authorities. The use of such data by the OECD is without prejudice to the status of the Golan Heights, East Jerusalem and Israeli settlements in the West Bank under the terms of international law.

copy the linklink copied!Introduction

Hardly a day goes by without the traditional or social media highlighting how digitally driven scientific or technological breakthroughs might transform daily life. Much computer speech and image recognition has attained human-like levels of performance, while self-driving cars are gradually improving their safety record. Media attention to such breakthroughs is provoking a deeper reflection among policy makers concerned with science, technology and innovation (STI). How is the nature of science and innovation itself changing? How, if at all, should this change be managed?

The public’s exposure to the accumulation of anecdotal evidence on the digital transformation of science and innovation builds views of change shaped by how close they are to specific developments. But how widespread are those specific developments? Which practices have fully broken into the mainstream? Which practices remain the preserve of relatively small communities at the leading edge? Do different facets of digitalisation complement or offset each other? Is debate excessively focused on practices that are no longer at the forefront, and are incipient signals about the direction of change being missed?

Addressing these questions requires a comprehensive view of how science and innovation are “going digital”. The digital revolution is based on the growing possibilities to create and use data, information and knowledge, and ultimately to support decision making, science and innovation policy. As such, it requires data and measurement that help map the ongoing transformations, their causes and their effects.

This chapter reports on some key features and trends in the digitalisation of science and innovation. To that end, it draws principally on statistical measurement and analysis under the aegis of the OECD’s Working Party of National Experts on Science and Technology Indicators (NESTI), including contributions featured in the report titled Measuring the Digital Transformation: A Roadmap for the Future (OECD, 2019a), a publication that provides a broad statistical and measurement-oriented view of digitalisation and accompanies the OECD report Going Digital: Shaping Policies, Improving Lives. Both in and outside the OECD, work to measure digitalisation is also a basis for collective choices about the data that policy makers wish to have and act upon (see Chapter 7). This chapter provides a number of reflections on measurement gaps and what can be done, and is being done, to address them.

Given the breadth of digitalisation’s influence, and the available evidence, some perspective is needed. Historically, the development of science and technology have been intertwined. Innovation in measurement tools provided a means to improve scientific understanding of nature, and this knowledge also turned out to be essential for innovation. Each wave of widespread technological development has raised the question of what makes it truly distinctive and unique and how it might affect science and innovation (Furman, 2016). For the current wave of digitalisation, several core questions emerge about the distinctiveness of new digital technology. What does it enable that was previously impossible or prohibitively expensive? In addition, how will the key features of digital technology, e.g. various externalities, contribute to further developments that could lead to its more intensive use?

Chapter 2 examines how the science system contributes to developing capabilities that can support the digital transition and how the former is impacted by changes in the possibilities and costs associated with digital economic activity (Goldfarb and Tucker, 2017). In science, as in several other fields, the greater information availability brought about by the digital revolution does not necessarily result in greater information quality. Not surprisingly, then, considerable effort in science and innovation aims to deploy digital technologies to help make information useful for meaningful and reliable quality assurance, classification and prediction.

As a result, this chapter also dedicates space to discussing trends and features of research activity related to automating human-like cognitive functions through artificial intelligence (AI). AI is considered to be both a general-purpose technology – i.e. it has a wide domain of applications – as well as a new method of research and invention (Agrawal, Gans and Goldfarb, 2018; Cockburn, Henderson and Stern, 2018; Klinger, Mateos-Garcia and Stathoulopoulos, 2018). Other developments, such as those related to developing computer enabled tamper-proof mechanisms for trust and assurance, are not covered here for reasons of space and limited statistical evidence, but can be just as important.

copy the linklink copied!Science going digital

Scientific research on digital technologies

Advances in scientific knowledge are key to developing new digital technologies. Over the last decade, the People’s Republic of China (hereafter “China”) almost trebled its contribution to computer science journals. In so doing, it overtook the United States in the production of scientific documents in this field. However, China’s share of documents that are in the world’s top-cited (top 10%, normalised by type of document and field) is still close to 7%, well below the United States at 17% (Figure 2.1).

copy the linklink copied!
Figure 2.1. Top 10% most cited documents in computer science, by country, 2016
Percentage of domestic documents (fractional counts) in the top 10% citation-ranked documents
Figure 2.1. Top 10% most cited documents in computer science, by country, 2016

Notes: Computer science publications consist of citeable documents (articles, conference proceedings and reviews) featured in journals specialising in this field. “Top-cited publications” are the 10% most cited papers normalised by scientific field and type of document. Instead of counting a publication repeatedly if two or more countries contribute to it, fractional counting distributes such publication across contributors so that all publications have the same equal weight.

Source: OECD (2019a), Measuring the Digital Transformation: A Roadmap for the Future,


China’s share of highly cited papers has nonetheless more than doubled since 2006. This makes it the second largest producer of highly cited computer science publications worldwide. In some countries, such as Italy, Israel, Luxembourg and Poland, scientific research in the field of computer science has a much higher citation rate than overall scientific production in those countries. Nearly 20% of computer science publications by Switzerland-based authors feature among the world’s top 10% cited scientific documents. This figure reaches 25% for Luxembourg, although with a much smaller level of scientific production.

Scientific research and artificial intelligence

Scientific production

AI research has aimed for decades to allow machines to perform human-like cognitive functions. Breakthroughs in computational power, the availability of data and algorithms have raised the capabilities of AI. In some narrow fields, its performance increasingly resembles that of humans. Such advances have allowed computers to distinguish between objects in images and videos, and interpret text through natural language processing, with growing levels of accuracy (OECD, 2017). The 2017 edition of the OECD Science, Technology and Industry Scoreboard provided initial evidence on the rapid growth in scientific documents that refer to machine learning – the general method underpinning current advances in data-driven AI – between 2003 and 2016. Interest in AI has triggered several measurement efforts, as documented in Box 2.1.

copy the linklink copied!
Box 2.1. Measurement of AI in research, technology and innovation

The OECD supports governments through policy analysis, dialogue and engagement, and identification of best practices. Significant effort is put into mapping the economic and social impacts of AI technologies and applications and their policy implications. This includes improving the measurement of AI and its impacts, as well as shedding light on important policy issues. These issues include labour market developments and skills for the digital age, privacy, accountability of AI-powered decisions, and the security and safety questions that AI generates (OECD, n.d. a).

Recent OECD analysis has looked at areas as diverse as scientific publications, conference proceedings, patenting, open-source software and venture capital investment. One such study has used data from Crunchbase, a commercial database on companies around the world, and found that AI start-ups had attracted around 12% of worldwide private equity investments in the first half of 2018, up 3% in 2011. US-based start-ups account for two-thirds of total investment since 2011 (Breschi, Lassébie and Menon, 2018; OECD, 2018c). China has seen a dramatic upsurge in AI start-up investment since 2016. From just 3% in 2015, Chinese companies attracted 36% of global AI private equity investment in 2017. In addition to AI measurement work reported elsewhere in this chapter, recent measurement of AI at the OECD includes analysis in collaboration with Germany’s Max Planck Institute for Innovation and Competition using data on patents with and analysis of open-source software publishing. Since 2014, AI open-source software recorded in GitHub grew about three times as much as the rest of open-source software. The number of AI IP5 patent families (namely those registered in the five major intellectual property [IP] offices) went up from close to 1 000 in 2001 to 2 500 in 2014 (Yamashita et al., forthcoming).

Several other public and private, national and international organisations have an active interest in measuring AI. Recent examples include reports by Elsevier (2018) on scientific publications and WIPO (2019), principally on patenting. The Electronic Frontier Foundation, which campaigns to protect civil liberties from digital threats, has started to measure and contextualise progress in AI. This not-for-profit organisation is working to assemble an open-source, online repository of data points on AI progress and performance (Simonite, 2017), benchmarking AI-enabled machine performance compared to humans. The AI Index, backed by the One Hundred Year Study on Artificial Intelligence, was established at Stanford in 2015 to examine the effects of AI on society. This initiative prioritises measurement and uses multiple sources (Shoham et al., 2018), including company reports and executive management surveys (such as Bughin et al., 2017 and McKinsey, 2018).

Text mining of keywords in scientific publications shows that Computer science is the most prevalent domain in AI-related science. It accounts for slightly more than one-third of all AI-related documents published between 1996 and 2016 (Figure 2.2). More than a quarter of all AI-related scientific publications and conference proceedings have appeared in Engineering journals and close to 10% in Mathematics journals. About 25% of the science involving AI (either drawing on AI or contributing to its general advancement) is found in a wide array of other scientific disciplines. These include Physics and astronomy, Medicine and Materials science, among others, demonstrating the growing pervasiveness of AI-related scientific research.

copy the linklink copied!
Figure 2.2. Scientific fields contributing to or making use of AI, 2006-16
Journal classification of AI-related scientific documents, as percentage of all AI-related documents
Figure 2.2. Scientific fields contributing to or making use of AI, 2006-16

Notes: AI = artificial intelligence. See Box 2.2 for further information.

Source: OECD (2019a), Measuring the Digital Transformation: A Roadmap for the Future,


copy the linklink copied!
Box 2.2. How is AI-relatedness measured in scientific publications and how can it be interpreted?

NNNAI-related documents are identified on the basis of Scopus-indexed articles, reviews and conference proceedings using a list of keywords to search on the abstracts, titles and author-provided keywords of scientific documents. The AI keywords are selected on the basis of high co-occurrence with terms frequently used in journals classified as AI-focused (a subcategory of Computer Sciences) by Elsevier, the publisher and provider of bibliographic information and related services.

In the OECD analysis, which focuses on documents published between 1996 and 2016, only those documents with two or more selected keywords were classed as AI documents in order to reduce the risk of including non AI-related documents. Relatedness in this context encompasses instances in which the document presents findings related to existing or new AI procedures. It also includes instances in which the document reports findings based on the application of AI procedures.

The ability to distinguish systematically between enabling and outcome dimensions of AI in the corpus of document titles, abstracts and keywords relies on the consistent recording both of research methods used and findings. As found in the AI literature, automated classification procedures can be substantially enhanced through richer data sources. This implies that analysis could be improved through access to the entire body of documents subject to analysis.

copy the linklink copied!
Figure 2.3. Trends in scientific publishing related to AI, 2006-16
Index of publication counts
Figure 2.3. Trends in scientific publishing related to AI, 2006-16

Notes: AI = artificial intelligence. See Box 2.2 for further information.

Source: OECD (2019a), Measuring the Digital Transformation: A Roadmap for the Future,


copy the linklink copied!
Figure 2.4. Top-cited scientific publications related to AI, 2016 and 2006
Economies with the largest number of AI-related documents among the 10% most cited publications
Figure 2.4. Top-cited scientific publications related to AI, 2016 and 2006

Notes: AI = artificial intelligence. Economies’ shares in global AI top-cited publications are based on fractional counts. See Box 2.2 for further information.

Source: OECD (2019a), Measuring the Digital Transformation: A Roadmap for the Future,


Scientific publishing related to AI (Box 2.2) has experienced a remarkable expansion over the past decade. From 2006 to 2016, the annual volume of AI-related publications grew by 150%, compared to 50% for the overall body of indexed scientific publications (Figure 2.3). China is now the largest producer of AI-related science, in terms of publications, and is fast improving the quality of its scientific production in this area. Back in 2006, China was already the largest producer of AI-related scientific publications, and grew its global share to 27% by 2016. In turn, the global publication shares accounted for by the EU28 and the United States declined over the same period, to 19% and 12% respectively. Also of note has been the rapid growth of AI-related publishing in India, which in 2016 contributed 11% of the world total. In other areas, however, different AI-related scientific publications have different levels of what is termed “citation impact”. Since it can be misleading to count all publications equally, further analysis has been carried out by focusing on AI-related publications attaining the highest citation rates (the top 10% most cited documents globally) within their respective journal disciplinary domains.

As shown in Figure 2.4, the EU28 and the United States are still responsible for the largest shares of highly cited AI-related publications (i.e. those featuring among the world’s top 10% most cited publications). However, from 2006 to 2016 their shares declined from 29% to 25% for the EU28, and from 31% to 21% for the United States. China, India, Iran and Malaysia all more than doubled their share of the world’s top-cited AI publications over the past decade.

Public funding of scientific research on AI

Given the transformative potential of AI, it is worth examining the scale and nature of government and business investment. There has been a plethora of policy announcements across countries that are difficult to compare. A 2016 White House report indicated that the United States invested USD 1.1 billion in “AI research and development [R&D]” in 2015, rising to USD 1.2 billion in 2016 (NSTC, 2016). The European Commission estimates it has dedicated close to 13% of its R&D budget to information and communication technology (ICT) since 2014 (EC, 2018). The United Kingdom’s Engineering and Physical Sciences Research Council has allocated more than GBP 400 million (USD 527 million) for research related to data science and AI through different mechanisms (BEIS and DCMS, 2018). In December 2017, Korea’s Ministry of Science and ICT announced plans to dedicate in 2018 the equivalent of USD 1.5 billion to AI and related areas in support for the “fourth industrial revolution” (EDaily, 2017). Japan’s Prime Minister Abe established a Strategic Council for AI Technologies in 2016 to ensure a co-ordinated approach across ministries and agencies for AI research, including new AI labs and complementary R&D centres.

Because AI does not fit neatly into pre-established taxonomies of R&D funding, detailed information sources at the micro level are needed to produce reliable and relevant statistical information. Available data systems and statistics lack systematic granular information about what publicly funded researchers work on, as opposed to what they publish. This makes them ill equipped to address subject-specific questions. Data on government-funded projects (often allocated on a competitive basis) provide a useful but partial view of the funding landscape that is most accurate when project-based funding dominates over other resource allocation mechanisms for scientific research funding. No international data infrastructure brings together research funding agencies’ databases on the basis of an explicit agreement that renders them comparable. A number of commercial providers grant related information services based on data collected from publicly available sources or bilateral data-sharing agreements. The OECD is seeking to addresses this information gap by assessing the feasibility of a shared data resource for analysis through the Fundstat pilot project. The OECD has also begun new work to map research funding trends using case studies for demonstration purposes and focusing on AI given its high policy relevance.

To date, the case studies have focused on two major US agencies, the National Institutes of Health (NIH), one of the world’s main funders of biomedical research, and the National Science Foundation (NSF), which covers several areas including civilian computer science research.1 The analysis uses funding data from 2001 to 2017 from the NIH RePORTER database (over 1.2 million granted applications) and the NSF Award Search System 2018 (over 200 000 granted applications). Over less than two decades, the share and volume of AI-related funding has grown significantly for both agencies. AI-related funding in 2017 (Figure 2.5) represented close to USD 820 million for NIH (i.e. 3.6% of total NIH health R&D funding) and USD 388 million for NSF (7.3% of NSF R&D funding).

copy the linklink copied!
Figure 2.5. Estimated NIH and NSF funding for AI-related R&D, 2001-17
Figure 2.5. Estimated NIH and NSF funding for AI-related R&D, 2001-17

Notes: AI = artificial intelligence; NIH = National Institutes of Health; NSF = National Science Foundation. This is an experimental indicator.

Source: OECD calculations based on NIH RePORTER (database) and NSF Award Search (database) (accessed 1 December 2018).


copy the linklink copied!
Figure 2.6. Estimated share of AI-related R&D funding within NIH institutes
“AI intensity” for selected institutes with the largest estimated amounts of AI funding
Figure 2.6. Estimated share of AI-related R&D funding within NIH institutes

Notes: AI = artificial intelligence; R&D = research and development; NIH = National Institutes of Health. This is an experimental indicator. For clarity of presentation, and with the exception of the National Library of Medicine, responsible NIH institutes’ names are presented by referring solely to their missions/subjects.

Source: OECD calculations based on NIH RePORTER (database) (accessed 1 December 2018).


Analysis of NIH-AI funding data shows which institutes appear to make more intensive use of AI, as implied in the awards granted (Figure 2.6). The National Library of Medicine (NLM) (secondary axis) accounts for the largest share of AI-related research within NIH (about one-third of the total). It also has the highest internal AI intensity at close to 80%, followed by the National Human Genome Research Institute at 5%. In total funding terms, NLM is followed by the National Cancer Institute, which has an AI intensity of less than 1%.

Figure 2.7 shows the incidence of AI-related R&D within the NSF directorates with responsibility for managing the funding for different disciplinary domains. AI intensity in 2018 is more than 35% in the case of Computer and information sciences (displayed on the secondary axis), up from less than 10% in 2001. This is followed by Engineering (general) at 11%, up from nearly 2% in 2012.

The use of funding data through text analysis is a promising avenue for understanding developments in AI research. Funding data help develop a timelier and more finely grained picture that connects funding agencies, their missions and traditional disciplinary areas. This can be an important complement to measurement on AI in related domains. The challenge is to work towards securing comprehensive data sources with high-quality text descriptions about the nature of R&D projects across several countries. More than a big data challenge, this is a co-ordination challenge that policy makers can help address, particularly in light of the OECD Recommendation of the Council on Artificial Intelligence (OECD, 2019b). The OECD council recommendation does explicitly state that governments

“should consider long-term public investment, and encourage private investment, in research and development, including interdisciplinary efforts, to spur innovation in trustworthy AI […]”.

Monitoring this recommendation requires concerted action.

copy the linklink copied!
Figure 2.7. Estimated share of AI-related R&D funding within NSF disciplines
“AI intensity” for selected disciplinary directorates with the largest amounts of AI funding
Figure 2.7. Estimated share of AI-related R&D funding within NSF disciplines

Notes: AI = artificial intelligence; R&D = research and development; NSF = National Science Foundation. This is an experimental indicator.

Source: OECD calculations based on NSF Award Search (database) (accessed 1 December 2018).


The science system and its contribution to the development of digital skills

Any overview of how science and innovation are digitalising must examine how the system helps develop skills and competences critical to the digitalisation process within science itself and across society and ultimately utilises them. Figure 2.8 presents the distribution of new tertiary graduates in the natural sciences, engineering and ICT fields for 2016. It shows that Estonia, Finland, India and Ireland have the largest shares of graduates in designated ICT fields.

copy the linklink copied!
Figure 2.8. Tertiary graduates in natural sciences, engineering and ICT fields, 2016
As a percentage of all tertiary graduates
Figure 2.8. Tertiary graduates in natural sciences, engineering and ICT fields, 2016

Notes: ICT = information and communication technology. Data on ICT graduates for Japan are included in other fields. The Netherlands excludes doctoral graduates. Data for China not included because of reporting differences. Natural sciences and engineering account for about 25% of higher education institution graduates (60% for new doctorates).

Source: OECD (2018a), Education at a Glance: OECD Indicators,


Data from the OECD’s publication Education at a Glance show differences in numbers of graduates in ICT subjects at different levels of attainment (Table 2.1). For example, European countries graduate many doctoral students relative to those with lower levels of attainment. Conversely, in Korea, the United States and India relatively few individuals graduate at doctorate level given the numbers of graduates at the bachelor’s level. This may be due to differences in the opportunity cost of staying on for postgraduate study.

Higher education institutions (HEIs) can also prepare individuals to make use of advanced ICT skills in domains other than the computer sciences. Initiatives like the Open Syllabus Project can provide a basis for analysing the content of instruction in HEIs across different subjects. They can also provide insight into trends in the teaching of digitally based methods (OSP, n.d.). Researchers have used data from this project to compare offered tuition with skills demand and developments in scientific research. For example, Börner et al. (2018) compare features of academic syllabi, scientific publications and job advertisements. They show that the distribution of skills taught in the classroom is three to four times closer (in terms of content similarity) to skills described in research articles than the skills specified in job advertisements. Skills related to specific software and computational tools (often referred to as data science related [Box 2.3]) are found in the three types of documents. However, they tend to be highly specialised (not present in many courses, for example). Conversely, general research, management, problem-solving and management skills are both central to courses and job ads. Skills related to computational tools appear to show mutual predictability across scientific publications and job requirements, as if course offerings both anticipate and react to employers’ needs. This highlights a form of close interdependency between science and industry in this particular area, one not captured by standard indicators of science-industry knowledge flows.

copy the linklink copied!
Table 2.1. Graduates in ICT at different levels of attainment, selected countries, 2016

Bachelor’s level

Master’s level

Doctorate level


9 370

9 827



15 931

8 380

1 021


7 837

1 018


United Kingdom

15 275

6 733

1 136

United States

69 436

41 002

1 951


338 062

211 693


Russian Federation

31 087

29 251

1 860

Notes: ICT = information and communication technology. Data on China are not available because of lack of comparable data under new ISCED-Fields classification.

Source: OECD (2018a), Education at a Glance: OECD Indicators,

copy the linklink copied!
Box 2.3. Data science and data scientists

The US NIH define “data science” as “the interdisciplinary field of inquiry in which quantitative and analytical approaches, processes, and systems are developed and used to extract knowledge and insights from increasingly large and/or complex sets of data” (NIH, 2018). Google’s chief economist, Hal Varian, foresaw this trend when he argued in 2009 that the “sexy job in the next 10 years” would be “statistician” (Varian, 2019). This prediction has in a sense come true for those who are known as data scientists (OECD, 2018b).

The term “data scientist” is now widely used in business and management contexts not conventionally associated with scientific research. It refers to individuals with formal training at the junction of computer and decision sciences, modelling, statistics and applied mathematics. The particular combination of knowledge and skills, however, goes beyond those used in traditional business analytics posts. It allows data scientists to harness and interpret vast and growing amounts of data and information. Ultimately, this connects them to organisational decision making.

Are universities training a sufficient number of individuals who can do advanced research on digital tools and systems? Evidence from the 2017 OECD collection of data on the Careers of Doctorate Holders (CDH-light) shows that ICT doctorates account for a relatively small share of the doctorate population, typically with lower shares than at master’s or lower levels of tertiary attainment (Figure 2.9). Available figures indicate that at both doctorate and master’s levels, the share of ICT graduates is much higher among men than women.

While the history of computer science has seen periods when, like in the 1960s, women made up the majority of computer programmers, doctoral education among women was rare. It was only in 1965 that the first doctorate in computer science was awarded to a woman – Mary Keller – in the United States. In 2005, the proportion of women among entrants to doctoral programmes in the United States was just below 20%, a value close to the OECD average (OECD, 2018a). In most countries, the share of female entrants to doctoral programmes is below 30%, which is less than for engineering programmes. These figures are similar to entry shares at bachelor’s or equivalent levels.

copy the linklink copied!
Figure 2.9. Individuals holding master's (ISCED7) and doctorate (ISCED8) level degrees in ICT, 2016
As a percentage of graduates in all fields, by sex and attainment level
Figure 2.9. Individuals holding master's (ISCED7) and doctorate (ISCED8) level degrees in ICT, 2016

Note: ICT = information and communication technology.

Source: OECD calculations based on OECD (n.d. b), Careers of Doctorate Holders database,


copy the linklink copied!
Figure 2.10. The distribution of ICT doctorates across industries
As a percentage of all doctorates with a degree in ICT or any field
Figure 2.10. The distribution of ICT doctorates across industries

Notes: ICT = information and communication technology. Estimates based on data for Belgium, Brazil, Canada, Finland, Germany, the Netherlands, Switzerland and the United Kingdom.

Source: OECD calculations based on OECD (n.d. b), Careers of Doctorate Holders database,


In contrast with the low gender diversity of doctorate-educated individuals in the ICT area, these graduates are among the most likely doctorates to have been born abroad among the economies covered in the CDH2017 data collection. This indicates that the supply of skills in this area is potentially more exposed to policies that tighten up residential visa or nationality requirements. There are also significant reallocations within digitally oriented fields. In the United States, the number of doctoral recipients from domestic universities in Computer science increased by 20%, while in Electrical, electronics, and communications engineering decreased by 3% over the decade from 2007 to 2017. This compares to an overall growth for all engineering fields of 27% and 13% for all fields of science (National Science Foundation, 2018).

These individuals with high research competences in ICT-related subjects are found principally in the ICT industry, followed by professional services (which includes R&D specialist firms) and higher education (Figure 2.10). Holders of ICT doctorates are also more oriented to work in the business sector than the average doctoral graduate. CDH data also show that doctorate holders in the field of ICT are significantly more mobile across jobs than their counterparts in other fields. For example, in the United States, 30% of ICT doctorates have changed jobs in the last year, compared to a 15% average across fields.

Scientific research enabled by digital technology

As discussed in Chapters 1 and 3, digitalisation is changing the way research is conducted and disseminated. To examine the emerging patterns of digitalisation in science, the International Survey of Scientific Authors (ISSA) (Box 2.4) asked a global sample of scientists a number of questions. These included such questions as whether digital tools make scientists more productive; to what extent they rely on big data analytics, or share data and source codes developed through their research; and to what degree they rely on a digital identity and presence to communicate their research. Preliminary survey results reveal contrasting patterns of digitalisation across fields.

The use of advanced digital tools, including big data, is a defining feature of the computer sciences, followed by multidisciplinary research, mathematics, earth and materials sciences and engineering (Figure 2.12). The life sciences (with the exception of pharmaceuticals) and the physical sciences (other than engineering) report the largest relative efforts to make data and/or code usable by others. There are smaller systematic differences in the reported use of productivity tools, which happen to have much higher general adoption rates. Scholars in the engineering domains report using productivity tools relatively less frequently. Interestingly, the fields making less use of advanced digital and data/code dissemination tools – namely those in the social sciences, arts and humanities – are more likely to engage in activities that enhance their digital presence and external communication (e.g. use of social media).

copy the linklink copied!
Box 2.4. The OECD International Survey of Scientific Authors

During the last quarter of 2018, the OECD contacted a large, randomly selected group of authors of scholarly documents. The group was asked to respond to an online survey aimed at identifying patterns, drivers and effects of digitalisation in scientific research. This OECD ISSA obtained rich information from nearly 12 000 scholars worldwide about their use of a broad range of digital tools and related practices, in addition to other key demographic and career information.

Answers to 36 questions were analysed to identify four major “latent” factors. These represent how likely scientists are to i) make use of productivity tools to carry out regular tasks such as retrieving information and collaborating with colleagues; ii) make data and code outputs available to others; iii) use or develop unconventional data and computational methods; and iv) maintain a digital identity expanding their communication with peers and the public in general. Analysis of a variable closely correlated with the third factor shows the digitalisation of science is not limited to scientific fields that specialise in computer science or information technology engineering. More detailed results and analysis from this study will be available on the project website

copy the linklink copied!
Figure 2.11. Use and development of big data across scientific domains, 2018
Figure 2.11. Use and development of big data across scientific domains, 2018

Notes: This is an experimental indicator. “Other life sciences” include: Biochemistry, Genetics, Molecular biology, Immunology and microbiology. “Big data” refers to authors who answer that their teams use or develop “data with size, complexity and heterogeneity features that can only be handled with unconventional tools and approaches, e.g. Hadoop”. Estimates are weighted and take into account the sample design as well as non-response.

Source: OECD calculations based on OECD (n.d. c), OECD International Survey of Scientific Authors 2018,


Differences in digitalisation patterns are also marked by personal and sectoral employment characteristics. Younger scientists are more likely to engage in all four dimensions of digital behaviour. This confirms digitalisation patterns found in ICT use surveys addressed to individuals in the general population. Female scientists are less likely than their male counterparts to use and develop advanced digital tools. However, they are more likely to engage in enhancing their digital presence, identity and communication, even after accounting for differences in field and country.

Scientific authors that work in the business sector are also more likely than those in other sectors to make use of advanced digital tools linked to big data and less likely to engage in data/code dissemination activities and online presence and communication. By contrast, authors in the higher education sector are more prone to use digital productivity tools (indeed most of those presented in the survey are related to academic tasks), as well as online presence and communication.

Research paradigms and digitalisation

Since digital tools can transform how scientific research is conducted, ISSA survey respondents were allowed to describe their scientific research work with respect to the use of theory, simulations, empirical non-experimental and experimental activity, and combinations among these. Scientific research practices correlate with digital practices in complex ways. Researchers engaged in computational and modelling work (37% of the sample) are the most likely to use advanced digital tools. However, they are also less likely to engage in online presence and communication activities. Together with researchers involved in experimental work (49%), they are also the most likely to engage in data and code dissemination practices, for example through platforms such as GitHub.

copy the linklink copied!
Figure 2.12. Patterns of digitalisation in science across fields, 2018
Average standardised factor scores for four different facets of digitalisation, by field
Figure 2.12. Patterns of digitalisation in science across fields, 2018

Notes: This is an experimental indicator. This figure presents average scores for four latent factors representing different facets of digitalisation for each scientific field. The factor analysis is based on responses by scientists to 36 questions relating to digital or digitally enabled practices. These are combined in four synthetic indicators that have been normalised to have overall zero average and identical variance.

How to read this figure: computer science’s highest score for the factor representing use of advanced digital tools (grey line) represents high relative intensity on this facet. Conversely, a low relative intensity is seen on the digital facet representing online presence and communication (dotted line) for scientists in this area.

Source: OECD calculations based on OECD (n.d. c), OECD International Survey of Scientific Authors 2018,


Those reporting work on gathering information (37%) are surprisingly not among those most likely to disseminate data and code. This suggests considerable scope for digitalisation of their data diffusion activity. Among this group, the use of digital productivity tools is nonetheless high. Those involved in theoretical work (46%) tend to make limited use of most digital practices. The incidence of digital practices among those undertaking empirical, non-experimental work (45%) is most common in the social sciences. It is relatively constrained in terms of data/code dissemination (creating a challenge for replicability) and advanced digital tools.

Open science and digitalisation

One important avenue of enquiry relates to the scope for digitalisation to address some perceived structural problems in how research is collectively organised. As Chapter 3 discusses, digitalisation offers a variety of opportunities for open science practices. For example, digitalisation can help reduce transaction costs; promote data reuse; increase rigour and reproducibility; and decrease redundant research. It can also better involve patients, consumers and others; facilitate researcher transparency in sharing information on processes and results; and improve connections between a larger variety of actors to produce more innovative approaches and solutions (Gold, 2016). Open Science encompasses multiple dimensions, including unhindered access to scientific articles, access to data from public research, and collaborative research enabled by ICT tools and complementary incentives. Broadening access to scientific publications, data and code is at the heart of open science so that potential benefits are spread as widely as possible (OECD, 2015b). Interest is growing in monitoring the use of such practices (Gold et al., 2018).

Open access to documents

Access to scientific research articles plays an important role in the diffusion of scientific knowledge. Digital technology facilitates the sharing of scientific knowledge to promote its use for further research and innovation. Open access (OA) indicators reported in OECD (2017) reveal that 60% to 80% of content published in 2016 was, one year later, only available to readers via subscription or payment of a fee (Figure 2.13). Journal-based OA (usually termed “gold” OA) is particularly noticeable in Brazil, as well as in many other Latin American economies. Repository-based OA (also known as “green” OA) is especially important for authors based in the United Kingdom. About 5% of authors appear to be paying a fee to make their papers publicly available in traditional subscription journals (also known as “gold hybrid” OA).2

copy the linklink copied!
Figure 2.13. Open access of scientific documents, 2017
As a percentage of a random sample of 100 000 documents published in 2016, by country of affiliation
Figure 2.13. Open access of scientific documents, 2017

Source: OECD (2017), “Open access of scientific documents, 2017: As a percentage of a random sample of 100 000 documents published in 2016”,


Assessing the extent to which OA publications receive more citations than non-OA publications helps policy makers evaluate the social costs and benefits of alternative mechanisms for funding scientific publication. This has led to efforts to measure the “open access citation advantage”. Bibliometric analysis confirms previous findings of a mixed picture (OECD, 2015b; Boselli and Galindo-Rueda, 2016), as not all forms of OA appear to confer a citation advantage. OA is in general associated with higher citation rates among documents covered by major indices. However, this does not apply to documents published in OA journals, which on average tend to be more recent and present lower historical citation rates. Repository-based (green) OA systematically confers a citation advantage. In most cases, higher citation rates are generally found for “gold hybrid” documents. These are articles published in subscription journals whose authors pay publishers a fee to enable free online access on the part of potential readers. The ISSA1 study showed that researchers had a positive willingness to pay to disseminate their result conditional on their paper being accepted. The results from the ISSA1 and ISSA2 studies confirm that authors of documents in gold OA journals tend to report significantly lower earnings, point to strong and self-reinforcing prestige effects that are dissociated from dissemination objectives in the digital era (Fyfe et al., 2017). Evidence points to OA increasingly becoming the norm. Moreover, incumbent high prestige journals look likely to take advantage of their current citation advantage. This leads to the fundamental question: what type of OA model will prevail in the longer run for signalling quality?

Open access to data and code

Measuring and understanding access to data and code are also important for mapping open science practices. The ISSA2 study has gone beyond probing the access status of publications. It also considers the status of other research outputs, in particular the code and data reported by authors to have been developed as part of the published research. The study shows that less than half of respondents in all science fields deliver data or code to a journal or publisher as support to their publication. The use of repositories for data archiving and dissemination seems to be most common among respondents in the life sciences. Informal data or code sharing among peers seems to be the main way researchers in all fields make data available to others.

The publication of research data or code does not imply that other researchers can easily use and reuse them. Use might be impaired if the access costs are prohibitive or access implies other challenges. For this reason, the ISSA2 survey asks about charging policy. It also asks about attributes that are part of the open science principles of findability, accessibility, interoperability and reusability.

The practice of adopting standard mechanisms for requesting and securing data access seems to be uncommon in all disciplines. Less than 30% of respondents indicated using such mechanisms when sharing their data or codes. Likewise, a low percentage of respondents (about 10%) applied a data usage licence to their data. Reusability of data seems to be ensured mainly through the development and provision of detailed and comprehensive metadata, especially in the physical sciences and engineering. Compliance with standards that facilitate data combination with other sources is more common in health and life sciences, whereas it seems to be less diffused in the physical sciences and engineering.

In all fields, authors tend to report several barriers to access of scientific outputs. These include formal sharing requirements set by publishers, funders or the respondent’s organisation; IP protection systems; and resources necessary for dissemination. Career objects and peer expectations were pre-eminently reported as drivers of enhanced access. Privacy and ethical considerations tend to limit access to scientific outputs in health sciences. Dissemination costs in terms of time and money are deemed strong barriers. However, respondents do not consider capabilities for managing disclosure and sharing as important either way.

Digitalisation and the broader impacts of science

Another key policy question is the extent to which scientists that engage in non-academic activities exhibit different patterns of digital competence. Data from the 2018 ISSA suggest that scientists who have applied or registered for IP protection; done consultancy work; started new companies or served as executives; and engaged in various societal outreach activities, such as supporting the work of museums and charities tend to exhibit also higher levels of competence in advanced digital tools (Figure 2.14).

Those scientists who have started companies or served as executives – about 20% of the sample – had the biggest advantage in advanced data competences; the gap is close to one-half of the standard deviation for this latent factor. The gap is also particularly large for persons engaging in IP application or registration (reported in about one-fifth of cases), and significant too for those undertaking consultancy work and societal engagement.

copy the linklink copied!
Figure 2.14. Digital activity of scientific authors by engagement in external activities, 2018
Difference in digital intensity scores between authors active and non-active in external activities
Figure 2.14. Digital activity of scientific authors by engagement in external activities, 2018

Notes: How to read this chart. Scientists who have founded companies or served as executives have an expected latent competence in advanced digital tools that is 0.45 standard deviations larger than those who have not. In contrast, their expected competence in digital productivity tools is much closer to that of others, with a difference of less than 0.1 standard deviations. See notes for Figure 2.12.

Source: OECD calculations based on OECD (n.d. c), OECD International Survey of Scientific Authors 2018,


All this points to the high demand for these skills in the economy and society. The digital advantage in terms of individuals’ online presence and communication is particularly marked for those engaged in societal outreach activities (also political work, not reported in the chart) and consultancy work. There is no significant difference in this digital factor for those active in IP and those who are not.

Looking ahead: Scientists’ perspectives on digitalisation and its impacts

How do scientists themselves view the digital transformation of scientific research and its impacts? Evidence from the 2018 ISSA study suggests that scientists are on average positive across several dimensions (Figure 2.15). Many respondents feel that digitalisation has positive potential to promote collaboration, particularly across borders, and improve the efficiency of science. While remaining positive, scientists appear less optimistic regarding the potential impact of digitalisation on the system of incentives and rewards. Specifically, they are concerns about being rated on the basis of their digital “footprint”, such as their publications and citations, as well as downloads of their work. They also have reservations about whether digitalisation can bring scientific communities and scientists together with the public (inclusiveness). Finally, they sometimes question the role of the private sector in providing digital solutions to assist their work. Younger authors are generally more positive than their older peers, except with respect to the impacts of digitalisation on the incentive system, which may reflect concerns about their future careers.

copy the linklink copied!
Figure 2.15. Scientists’ views on the digitalisation of science and its potential impacts, 2018
Average sentiment towards “positive” digitalisation scenarios, as percentage deviation from mid-viewpoint
Figure 2.15. Scientists’ views on the digitalisation of science and its potential impacts, 2018

Notes: This is an experimental indicator. Survey respondents were asked to rate opposing scenarios on different dimensions from (1 = fully agree with a negative view) to (10 = fully agree with a positive view). For interpretability, weighted average scores on each dimension and the general summary view (weighted average across dimensions) are presented as percentage deviations from the midpoint. This means, for example, that with respect to the subject of “Science across borders”, respondents are on average 50% oriented towards the positive outcome, relative to the neutral perspective. Weighted average scores take into account the sample design and non-response.

Source: OECD calculations based on OECD (n.d. c), OECD International Survey of Scientific Authors 2018,


Across countries, the average sentiment towards the impacts of digitalisation (Figure 2.16) seems consistent overall with results from broader population surveys on attitudes towards science and technology (OECD, 2015e). Scientists in emerging and transition economies appear to be more positive on average towards the impacts of digitalisation on science. The position of scientists in the most R&D-intensive European economies is more reserved, while still positive in the main. These results do not imply that scientists are by and large dismissive of the potential pitfalls of digitalisation. A minority, but still a significant number, of respondents tended to agree with “negative” statements about the impacts of digitalisation on science. They were concerned, for example, about the promotion hypothesis-free research in computationally intensive data-driven science. For these respondents, digitalisation could also accentuate divides in research between those with advanced digital competences and those without. It could also encourage a celebrity culture in science, premature diffusion of findings and individual exposure to pressure groups. Digitalisation could also lead to use of readily available but inappropriate indicators for monitoring and incentivising research. Finally, they agreed with the statement that digitalisation could concentrate workflows and data in the hands of a few companies providing digital tools.

copy the linklink copied!
Figure 2.16. Scientists’ views on the digitalisation of science, by country, 2018
Average sentiment towards a “positive” digitalisation scenario, as percentage deviation from the mid-range of possible views
Figure 2.16. Scientists’ views on the digitalisation of science, by country, 2018

Notes: This is an experimental indicator. Cross-country comparisons should be interpreted with caution as the population of corresponding scientific authors is not uniformly representative of their scientific community. Economies with less than 75 survey responses have been removed. Average scores are weighted and take into account the sample design and non-response. See notes for Figure 2.15.

Source: OECD calculations based on OECD (n.d. c), OECD International Survey of Scientific Authors 2018,


copy the linklink copied!Technology and innovation going digital

Development of digital technologies

R&D in ICT industries and ICT-driven R&D

As an activity defined by the pursuit of new knowledge, R&D is important in driving advances in digital technologies. Businesses are the main source of R&D. Information industries are particularly strong contributors in countries with high business R&D intensity, accounting for just over half of all business R&D in some cases (Figure 2.17). Information industries also represent over 40% of business R&D in Estonia, Finland, Ireland, Turkey and the United States, confirming the knowledge-intensive nature of these industries.

Estimates of business R&D by industry fail to gauge perfectly the extent to which R&D contributes to digitalisation. This is important in the case of software because many firms invest in it for internal use and as a basis for providing other goods and services. By missing out software R&D (Box 2.5) in other sectors, the value of R&D in the software and information industries underestimates the total R&D aiming to generate new software. For instance, while software publishers in the United States account for 10% of all R&D performed and funded by companies, three times as much money was actually dedicated by US firms R&D aimed for software products or software embedded in other projects or products.

copy the linklink copied!
Figure 2.17. Business R&D expenditure, total and in information industries, 2016
As a percentage of GDP
Figure 2.17. Business R&D expenditure, total and in information industries, 2016

Notes: R&D = research and development; GDP = gross domestic product. “Information industries” are defined according to ISIC Rev.4 and cover ICT manufacturing under “Computer, electronic and optical products” (Division 26), and information services under “Publishing, audiovisual and broadcasting activities” (Divisions 58 to 60”), “Telecommunications” (Division 61) and “IT and other information services” (Divisions 62 to 63).

Source: OECD (2019a), Measuring the Digital Transformation: A Roadmap for the Future,


copy the linklink copied!
Figure 2.18. R&D intensity of ICT and other industries, 2016
As a percentage of gross value added in each industry, log scale
Figure 2.18. R&D intensity of ICT and other industries, 2016

Note: R&D = research and development; ICT = information and communication technology.

Source: OECD calculations based on OECD ANBERD,, STAN, and National Accounts, databases (accessed December 2018).


copy the linklink copied!
Box 2.5. Software and R&D: A measurement challenge

Software development and R&D are closely intertwined (OECD, 2015a; OECD, 2015c; OECD/Eurostat, 2018). For example, the software industry is among the most R&D-intensive across most countries (Figure 2.18). Following revision of international guidelines in 1993, national accounts (NAs) economic statistics were comprehensively updated, as purchases of software and the own-account production of software were recognised as capital formation (i.e. “real” or “fixed” investment). Subsequent updates in NA systems and practices in many countries expanded this treatment to include firms’ own development of software originals used for reproduction.

The latest (2008) update of international guidelines introduced the classification of R&D as fixed investment. In so doing, it adopted the OECD definition of R&D and its measurement guidelines as the basis for primary data collection. Consequently, national accountants had to deal with the natural overlap between the development of own-account software originals and R&D activity. Own-account software originals were already included as investment in the NA measures of own-account software. Therefore, they were excluded in most cases from the new R&D measures to avoid double-counting in the NA aggregates. This treatment introduced a misalignment between the NA measures and the primary source data underlying the estimates of investment in R&D, produced by organisations that participate in the OECD’s NESTI. This misalignment could increase over time and potentially confuse users if software-generating R&D accounts for an increasing share of R&D. Some countries, such as the United States, are resolving this apparent inconsistency by reclassifying the own-account production of software originals that meets the R&D definition as R&D. It is unclear how other countries will resolve this challenge.

The growing importance of software development as an economic activity also presents a test-case for the measurement of R&D. The criteria provided in the 2015 edition of the Frascati Manual (OECD, 2015c) are a case in point. They allow organisations reporting and collecting data for statistical and other administrative purposes (such as the provision of R&D tax incentives) to discriminate between genuine R&D and non-R&D activities. R&D in software includes software development or improvement that expands scientific or technological knowledge, as well as the development of new theories and algorithms in computer science. In contrast, R&D activity in software excludes software development that fails to meet such requirements, e.g. work to support or adapt existing systems, add minor functionality to existing application programs, etc.

Use of digital technology in business and the link between digitalisation and innovation

Although the way in which innovation responds to and influences digitalisation can be mediated by R&D and invention, but it would be wrong to identify them as the same concept. The Oslo Manual definition of an innovation (OECD/Eurostat, 2018) refers to a new or improved product or process (or combination of both). It must differ significantly from a unit’s previous products and processes and be available to potential users or brought into use by the unit. Innovation requires that implementation take place: it must transcend the space of ideas and inventions. At a minimum, the innovation has to be new to the organisation. Thus, this is a broad concept that encompasses diffusion processes that involve a significant change from the viewpoint of who is adopting them. In this regard, various digitalisation processes across the economy are effectively innovations for those who implement them. Data from business innovation surveys show the information services industry generally exhibits the largest rates of reported innovation (e.g. 75% in the case of France). This may partly reflect higher rates of obsolescence, which call for more frequent innovation.

Digitally based innovations can be found in any sector. They comprise product or process innovations that incorporate ICT (the product itself can be a digital good or service). They also comprise innovations that rely to a significant degree on the use of ICT for their development or implementation. A wide range of business process innovations entail fundamental changes in the organisation’s ICT function and its interaction with other business functions and the products delivered.

The latest edition of the Oslo Manual aims at ensuring that guidance fully reflects changes induced by digitalisation. For example, it recognises data development activities, along with software, as a potential innovation activity. Data accumulation by companies can entail significant direct or indirect costs. For example, a firm may give away for free, or at a discounted price, goods or services that generate information valuable for advertising products. The manual proposes to focus on developing measures of “digital competence”. This multifaceted construct seeks to reflect a firm’s ability to deal with digitalisation in a broad sense. Potential indicators, still to be harmonised ways in surveys, relate to:

  • levels of digital integration within and across business functions

  • access to an ability to use data analytics to design, develop, commercialise and improve products, including the ability to secure data about the (potential) users of the firm’s products and how they interact with the products (Rindfleisch et al., 2017)

  • access to networks and use of appropriate solutions and architectures

  • capacity to manage privacy and cybersecurity risks

  • adoption of appropriate business models for digital environments and platforms.

In addition to these internal capabilities, the manual recommends capturing, among the various external factors influencing innovation, information on the extent to which a firm uses digital platforms or is exposed to competition from them. Consumer and societal perspectives such as trust are also relevant to digitalisation. This measurement agenda requires close co-ordination with surveys on ICT use in firms. The latter are the responsibility of the OECD’s Committee on Digital Economy Policy and its Working Party on the Measurement and Analysis of the Digital Economy.

The OECD is highlighting country experiences in collecting data to motivate the collection and analysis of information at the junction between ICT adoption and innovation. It is also showcasing the data’s potential relevance for international comparative analysis. One example is a recent study of patterns of advanced technology and business practices (ATBPs) among Canadian firms. This was conducted within the scope of Statistics Canada (STC)’s 2014 Survey of Advanced Technology. Joint OECD-STC analysis (Verger et al., forthcoming) has helped map ATBP portfolios via factor analysis. This has revealed seven main categories of ATBP specialisation: logistics software technologies; management practices and tools; automated production process technologies; geomatics and geospatial technologies; bio-and-environmental technologies; software and infrastructure as a service; and additive and micro manufacturing technologies. The data indicate a strong complementarity between management practices and production and adoption of logistics technologies.

As shown in Figure 2.19, Verger et al. (forthcoming) has found the rate of use of ATBPs to be generally positively correlated to the size of firms. This is especially so in the area of automated production process technologies, where scale appears to be important. However, software and infrastructure as a service (i.e. including cloud computing) is a noticeable exception; unlike technologies such as robotics, it is similarly diffused in small and medium-sized enterprises (SMEs) and large firms. This latter finding underlines one of the distinctive features of the digital economy: the attractiveness of such technologies for SMEs and their potential role in enabling scaling up.

Characterising industries by ATBP use patterns complements the standard classification systems for industries. Such systems are mainly informed by the type of goods and services delivered, rather than the processes used to produce them. The correlation between R&D intensity and technology is high in manufacturing industries and low in services. Most non-manufacturing sectors have low R&D intensity, even though many are technology-intensive. These findings confirm the limitations of using R&D measures for building technology taxonomies of industries that include services.

copy the linklink copied!
Figure 2.19. Advanced technology usage in Canada: Large firms vs. SMEs
Relative odds of using advanced technology for large firms vs. SMEs
Figure 2.19. Advanced technology usage in Canada: Large firms vs. SMEs

Notes: SMEs = small and medium-sized companies. How to read this chart: large companies are nearly 12 times more likely to use robots with sensing or vision systems than SMEs.

Source: Verger et al. (forthcoming), “Exploring patterns of advanced technology and business practice use and its link with innovation: An empirical case study based on Statistics Canada’s Survey of Advanced Technologies”.


Lastly, the OECD-STC quantitative case study found that innovation is highly correlated with the use of certain business practices and advanced technologies (Figure 2.20). Regression results suggest that using advanced technologies doubles the odds of reporting innovations. The odds of innovating are trebled for users of selected business practices. The results also indicate complementarity between technology and management in explaining innovation. A positive relationship is also found between the development of technologies and innovation, especially for products, pointing at the advantages of being lead adopters.

copy the linklink copied!
Figure 2.20. The link between innovation and the adoption of technology and business practices, Canada, 2014
Estimated log odds ratios of reporting an innovation between technology and/or practice users and non-users
Figure 2.20. The link between innovation and the adoption of technology and business practices, Canada, 2014

Note: Estimates control for technology development activity, country of ultimate ownership control, and business size and industry.

Source: Verger et al. (forthcoming), “Exploring patterns of advanced technology and business practice use and its link with innovation: An empirical case study based on Statistics Canada’s Survey of Advanced Technologies”.


This analysis suggests that aspects of the SAT survey can be adopted more widely. With relevant adaptations, they could help assess the combined role of innovation, technology and management in business performance. A key challenge is to build consensus on which technologies and practices should be the focus of innovation surveys. Another challenge is how to implement approaches that compare data across countries, industries and longitudinally (given rapid technological change and obsolescence). At present, there is strong demand for specific analysis of the role of AI in business innovation strategies and activities.

copy the linklink copied!Conclusion

Digitalisation is everywhere in STI, but with varying depth and perspective

Ministers from OECD countries and partners at the OECD Ministerial Meeting in Daejeon (Korea) in 2015 recognised that the rapid evolution of digital technologies is revolutionising STI (OECD, 2015d). These technologies, it was noted, are changing the way in which scientists work, collaborate and publish. They are also increasing reliance on access to scientific data and publications, and opening new avenues for public engagement and participation in science and innovation. At the same time, they are facilitating the development of research co-operation between businesses and the public sector, and contributing to the transformation of innovation.

At the time, the OECD was asked to monitor this transformation. It was also invited to convene the international community working on STI data and indicators to develop new thinking and solutions for generating empirical evidence to guide policy. The 2016 OECD Blue Sky Forum ( identified the digitalisation of STI both as a priority object of measurement and as a fundamental enabler of future statistical and analytical work (OECD, 2018b). This principle guided the OECD’s work on the digitalisation of science and innovation in 2017-18. This chapter summarises the main results of this work. It presents selected and new evidence arising from recent work on measuring digitalisation in science, its potential drivers and impacts. The indicators presented also raise further questions.

The evidence presented has put a focus on the potential synergies and trade-offs faced by those in decision-making roles in the science and innovation system:

  • The geography of scientific activity in computer science and AI, measured by publications, has rapidly shifted. Formerly emerging economies like China have increased the quantity and quality of their publications, as implied by their citation impact.

  • Research on AI is increasingly embedded in government agencies’ funding of R&D across different missions and disciplinary areas. The example based on two major US funding agencies should soon be extended to other agencies and countries. However, this requires a concerted effort to maintain high-quality project information and to make it available for research policy purposes.

  • Research careers in the area of computer science in the OECD area open a broad range of opportunities within and outside academia, but fail to attract a significant share of women. Research careers in this area are more inclusive of individuals born or raised abroad, pointing to the importance of policies that influence the mobility of talent and consider changes in demand for skills.

  • Digital activity in science is highly pervasive, but there is considerable room for different disciplines to more fully exploit the potential of digitalisation. This is particularly true in the use of advanced tools that can transform the established research paradigms. Furthermore, high digital intensity is associated with many of the third mission activities that policy makers wish to encourage, such as creation of start-ups and societal engagement. There is some evidence of a generational and gender gap in the adoption of the most demanding digital practices.

  • By and large, scientists appear to be optimistic about the possibilities brought to the practice of science by digitalisation, especially the youngest. However, many among the latter harbour more reservations about implications for their own careers.

  • The adoption of advanced digital technologies appears to be highly correlated with the adoption of complementary business practices; this is closely associated with higher reported innovation. There is also evidence that firm size is a strong determinant of advanced technology adoption. However, among a representative sample of Canadian firms and after accounting for other characteristics, smaller firms are almost as likely to use cloud computing technologies as larger firms. It remains to be seen if this finding can be replicated in other contexts.

  • This chapter calls for enhanced measures of organisational competences linked to digital technologies that influence firms’ ability to innovate in the current landscape in which platforms play a major role. The ongoing revision of innovation surveys to adapt to the guidance of the newly published latest edition of the Oslo Manual is a significant opportunity for countries to reconsider how best to generate insights in this area.

More targeted measurement is required to address specific policy questions. These include how digitalisation can fundamentally expand the range of hypotheses generated and the speed at which competing research hypothesis can be tested. This could help address concerns about declining research productivity, public trust in science, lack of diversity and community engagement. It could also inform policy making so as to avoid a potential misalignment between career incentives and socially beneficial research.

Questions about the role of digitalisation also provide a much-needed stimulus to measure key dimensions of science and innovation once considered too complicated, or even unnecessary, to measure. Understanding how science adopts technologies and organisational practices can ultimately help explain how it can influence the direction of technical change and innovation more broadly.

Digitalisation is a “game-changer” for STI measurement and analysis

Digitalisation represents a major force for change in the generation and use of STI data and statistics. STI systems have become remarkably data-rich: information on innovation inputs and outputs that was only recorded in highly scattered, paper-based sources is now much easier to retrieve, process and analyse (OECD, 2018b). When researchers and administrators use digital tools, they leave traces that can be used to develop new databases and apply to indicators and analysis. The digitalisation of the patent application and scientific publication processes has already provided rich and widely used data resources for statistical analysis. Digitalisation is rapidly extending to other types of administrative and corporate data, e.g. transactions (billing and payroll data); website content and use metadata; and generic and specialised social media, in which STI actors interact with their peers and society. Data practitioners have viewed these new “big data” as “uncomfortable data”, i.e. datasets that are too large to be handled by conventional tools and techniques. But even these uncomfortable data are now more tractable.

The increasingly fuzzy boundary between qualitative and quantitative data is a striking example of how big data is becoming easier to manage. Many information gathering methodologies (e.g. user testing or interviews), for example, were traditionally considered as purely qualitative. However, they can now be conducted on a large scale and results quantified. For example, text, images, sound and video can all be “read” by machines. Natural language-processing tools automate the processing of text data from thousands of survey responses or social media posts into quantifiable data. These techniques can help alleviate some of the common challenges facing STI statistics, such as survey fatigue and unfit-for-purpose classification systems applied differently by human coders. Subsequently, they generate adaptable indicators.

Effective application of these new methods relies ultimately on fit-for-purpose, high-quality systems. These systems need to collect qualitative information consistently and avoid potential manipulation by parties with an interest in the use of the data. Administrative database managers become important gatekeepers of data quality, but information providers still need adequate incentives. Big data implies risks in exploiting datasets with possible defects and biases not recognised by the researchers. It also implies difficulties in evaluating big data techniques and analysis, especially using conventional criteria (such as falsifiability). And it implies complexities in explaining these techniques – and their value as evidence for policy evaluation – to decision makers and the public. In this new environment, work is moving progressively from fixed scales of analysis (such as the nation) towards variable categories, and dealing with vast new databases. This requires a different way of searching for patterns, trends, correlations and narratives.

The changing landscape for surveys has provoked much debate about their future. Some question whether the shift to big data is the precursor to the demise of surveys. Others, paraphrasing Mark Twain, argue that reports of the death of surveys are greatly exaggerated. The manner in which surveys are carried out has indeed changed, as online surveys have largely displaced more expensive non-digital methods. Surveys can therefore be targeted towards areas where other data sources are less effective (Callegaro and Yang, 2018; Jarmin, 2019).

Electronic tools (including do-it-yourself platforms) have “democratised” the process of surveying, making it easier than ever before. This has resulted in an explosion of surveys both in general and in the area of STI studies. However, these surveys often fail to meet basic statistical quality requirements, including for safeguarding privacy and confidentiality. The rapid growth of surveys also represents a growing source of fatigue for respondents; it results in lower expected response rates to non-compulsory (and compulsory, but difficult-to-enforce) surveys and may undermine trust. STI policy makers should co-ordinate, and apply standards to, their sponsored surveys.

New data sources for STI, such as administrative records, commercial databases and the Internet, have considerable transformational power stemming from their multidimensionality, and the possibilities for interconnecting the different types of subjects and objects covered. The strengths of these data sources are hard to reproduce in surveys. Traditionally, surveys were conceived to identify key actors and the presence of pre-defined types of interactions rather than to trace those linkages.

Digital solutions applied to survey tools that build trust and credibility can help bring out the potential of new data sources. Digital technologies are viewed as key components of the move towards “rich data”. They are crucial to validating and augmenting the quality of big data sources (Callegaro and Yang, 2018). Rather than competing with alternative sources, surveys look set to focus increasingly on the crucial information that cannot be obtained otherwise. Recent experience shows that trust and credibility will be the most crucial factors determining the success of survey efforts in the digital era.

The experience of the OECD ISSA study confirms the importance, when conducting surveys in the digital age, of ensuring mutual trust between data collector and respondent. The ISSA survey ultimately explores how to develop working knowledge of emerging topics of high policy relevance. This can help provide a potential basis for distributed data collection within countries. It can also create a mechanism for ongoing dialogue between the OECD and the global science community.

STI policy makers need to support the creation and adoption of standards to protect the integrity of data they wish to use to inform their policies, regardless of the source. Furthermore, risk management will become an integral part of science and innovation policy in the digital era. Policy makers will need to consider how to make digitally driven systems, including those based on AI, more trustworthy. As a result, measurement will need to increasingly map risks and uncertainty, and analyse how these impact digitalisation practices and policies. This will be an important component of decisions assessing the merits of different science and innovation policy options in the digital era.

Digital innovation and AI in particular are indeed at the top of the national and international, as reflected in the adoption by the OECD Council of a Recommendation on AI (OECD, 2019b). The OECD council recommendation does explicitly state that governments “should consider long-term public investment, and encourage private investment, in research and development, including interdisciplinary efforts, to spur innovation in trustworthy AI […]”. These policy priorities, with their often explicit demand for measurement and new evidence, will guide future OECD statistical and analysis work in this area.


Agrawal, A., J. Gans and A. Goldfarb (2018), Prediction Machines: The Simple Economics of Artificial Intelligence, Harvard Business Review Press, Cambridge, Massachusetts.

BEIS and DCMS (2018), “Artificial intelligence sector seal: A sector deal between government and the artificial intelligence (AI) sector”, Policy Paper, Government of the United Kingdom,

Börner, K. et al. (2018), “Skill discrepancies between research, education, and jobs reveal the critical need to supply soft skills for the data economy”, Proceedings of the National Academy of Sciences, December, Vol. 115/50, United States National Academy of Sciences, Washington, DC, pp. 12630-12637,

Breschi, S., J. Lassébie and C. Menon (2018), “A portrait of innovative start-ups across countries”, OECD Science, Technology and Industry Working Papers, 2018/02, OECD Publishing, Paris,

Boselli, B. and F. Galindo-Rueda (2016), “Drivers and implications of scientific open access publishing: findings from a pilot OECD International Survey of Scientific Authors”, OECD Science, Technology and Industry Policy Papers, No. 33, OECD Publishing, Paris,

Bughin, J. et al. (2017), “Artificial intelligence: The next digital frontier?” Discussion Paper, June, McKinsey Global Institute,

Callegaro, M. and Y. Yang (2018), “The role of surveys in the era of ‘big data’”, in D. Vannette and J. Krosnick (eds.), The Palgrave Handbook of Survey Research, Palgrave Macmillan, Cham, Switzerland,

Cockburn, I., R. Henderson and S. Stern (2018), “The impact of artificial intelligence on innovation”, in A. Agrawal, J. Gans and A. Goldfarb (eds.), The Economics of Artificial Intelligence: An Agenda, University of Chicago Press.

EC (2018), Artificial Intelligence for Europe; Communication from the Commission to the European Parliament, the European Council, the Council, the European Economic and Social Committee and the Committee of the Regions; COM (2018) 237 final, European Commission, Brussels.

EDaily (2017), “Big Data, 5G, autonomous car, etc.”,

Elsevier (2018), Artificial Intelligence: How Knowledge is Created, Transferred, and Used, Elsevier, Amsterdam,

Furman, J. (2016), “Is this time different? The opportunities and challenges of artificial intelligence,” presentation at AI now: The social and economic implications of artificial intelligence technologies in the near term, New York University, 7 July,

Fyfe, A. et al. (2017), “Untangling academic publishing: A history of the relationship between commercial interests, academic prestige and the circulation of research”, Briefing Paper, May, University of St. Andrews, United Kingdom,

Gold, E.R. et al. (2018), “An open toolkit for tracking open science partnership implementation and impact” [version 1; not peer-reviewed], Gates Open Research 2018, 2/54,

Gold, E.R. (2016), “Accelerating translational research through open science: The neuro experiment”, PLOS Biology, Vol. 14/12, e2001259,

Goldfarb, A. and C. Tucker (2017), “Digital Economics”, NBER Working Paper No. 23684, National Bureau of Economic Research, Cambridge, Massachusetts.

Jarmin, R.S. (2019), “Evolving measurement for an evolving economy: Thoughts on 21st century US economic statistics”, Journal of Economic Perspectives, Vol. 33/1, American Economic Association, Nashville, pp. 165-184,

Klinger, J., J.C. Mateos-Garcia and K. Stathoulopoulos (2018), “Deep learning, deep change? Mapping the development of the artificial intelligence general purpose technology”, 17 August, SSRN,

McKinsey (2018), “AI adoption advances, but foundational barriers remain”, Survey, McKinsey Institute,

National Science Foundation (2018), Doctorate Recipients from U.S. Universities: 2017. Special Report NSF 19-301, National Center for Science and Engineering Statistics. Alexandria, VA,

NIH (2018), NIH Strategic Plan for Data Science, National Institutes of Health, Bethesda, United States,

NSTC (2016), Preparing for the Future of Artificial Intelligence, Executive Office of the President National Science and Technology Council Committee on Technology, Washington, DC,

OECD (n.d. a), “Going Digital”, webpage, (accessed 1 June 2019).

OECD (n.d. b), Careers of Doctorate Holders database, 2017 CDH-light data collection,

OECD (n.d. c), OECD International Survey of Scientific Authors 2018,

OECD (2019a), Measuring the Digital Transformation: A Roadmap for the Future, OECD Publishing, Paris,

OECD (2019b), Recommendation of the Council on Artificial Intelligence, OECD, Paris,

OECD (2018a), Education at a Glance: OECD Indicators, OECD Publishing, Paris,

OECD (2018b), “Blue Sky perspectives towards the next generation of data and indicators on science and innovation”, in OECD Science, Technology and Innovation Outlook 2018: Adapting to Technological and Societal Disruption, OECD Publishing, Paris,

OECD (2018c), “Private equity investment in artificial intelligence”, OECD Going Digital Policy Note, OECD, Paris,

OECD (2017), “Open access of scientific documents, 2017: As a percentage of a random sample of 100 000 documents published in 2016”, in Science, Technology and Industry Scoreboard 2017: Knowledge economies and the digital transformation, OECD Publishing, Paris,

OECD (2015a), Data-Driven Innovation: Big Data for Growth and Well-Being, OECD Publishing, Paris,

OECD (2015b), “Making open science a reality”, OECD Science, Technology and Industry Policy Papers, No. 25, OECD Publishing, Paris,

OECD (2015c), Frascati Manual 2015: Guidelines for Collecting and Reporting Data on Research and Experimental Development, The Measurement of Scientific, Technological and Innovation Activities, OECD Publishing, Paris,

OECD (2015d), “Daejeon Declaration on Science, Technology, and Innovation Policies for the Global and Digital Age”, webpage, (accessed 1 June 2019).

OECD (2015e), “Public perceptions of science and technology”, in OECD Science, Technology and Industry Scoreboard 2015: Innovation for growth and society, OECD Publishing, Paris,

OECD/Eurostat (2018), Oslo Manual 2018: Guidelines for Collecting, Reporting and Using Data on Innovation, 4th Edition, The Measurement of Scientific, Technological and Innovation Activities, OECD Publishing, Paris/Eurostat, Luxembourg,

OSP (n.d.), Open Syllabus Project website, (accessed 1 June 2019).

Piwowar et al. (2018), “The state of OA: A large-scale analysis of the prevalence and impact of open access articles”, PeerJ, 6:e4375,

Rindfleisch, A et al. (2017), “The digital revolution, 3D printing and innovation as data”, Journal of Product Innovation Management, Vol. 34/5, Wiley Online Library, pp. 681-690,

Shoham, Y. et al. (2018), “The AI Index 2018 Annual Report”, AI Index Steering Committee, Human-Centered AI Initiative, December, Stanford University, Stanford, United States.

Simonite, T. (2017), “Do we need a speedometer for artificial intelligence?”, WIRED, 30 August,

Varian, H. (2019), “Artificial intelligence, economics, and industrial organization”, in A.K. Agrawal et al. (eds.), The Economics of Artificial Intelligence: An Agenda, National Bureau of Economic Research, Cambridge, United States.

Verger, F. et al. (forthcoming), “Exploring patterns of advanced technology and business practice use and its link with innovation: An empirical case study based on Statistics Canada’s Survey of Advanced Technologies”, OECD Science, Technology and Industry Working Papers, OECD Publishing, Paris.

WIPO (2019), WIPO Technology Trends 2019: Artificial Intelligence, World Intellectual Property Organization, Geneva,

Yamashita et al. (forthcoming), “Identifying and measuring developments in artificial intelligence”, OECD Science, Technology and Industry Working Papers, OECD Publishing, Paris.


← 1. In the application area of defence, the Defense Advanced Research Projects Agency is responsible for much of the research funding related to computer science in the United States. Project-level information is not readily available in this case. While AI is not separately identifiable, this agency’s unclassified budget estimates for 2019 contain 21 references to AI research. The 2017 funding for R&D, testing and evaluation for the “Defense Research Sciences Program” alone includes USD 145 million for Mathematics and Computer Sciences and USD 46 million for Cyber Sciences.

← 2. The OECD analysis of OA was based on a random sample of 100 000 documents drawn from articles, reviews and conference proceedings published in 2016, listed in the Scopus database and having digital object identifiers (DOIs). Assessment of the OA status of the documents was conducted in June 2017 using the R-language based “wrapper” routine for the oaDOI application program interface. It was produced by ImpactStory, an open-source website that works to help researchers explore and share the online impact of their research. The API returns information on the ability to secure legal copies of the relevant document for free and the different mechanisms available: Gold OA journal; Gold hybrid; Green OA. When the DOI cannot be resolved to any source of access information, the result is marked as “No information – status not available”. This category is particularly high for China at more than 15%. When the DOI resolves and the return indicates there are no legal open versions available, the document is marked as “Closed”. This includes documents under embargo. The oaDOI application and related “unpaywall” browser extension have since been further refined and developed. They now identify an additional category of publications, namely those that are free to read on the publisher’s page, but without a clearly identifiable licence (labelled as “bronze”). Most of these documents went unnoticed in previous versions of the oaDOI application and were treated as closed. Piwowar et al. (2018) suggest the percentage of publications in this category is around 15% for the most recent publications with valid DOIs. This brings the percentage of functionally open documents closer in line with evidence in the ISSA2 study, which suggests about 65% of research documents published in 2017 can be freely accessed online one year later. This experience points to the usefulness for policy research purposes of APIs with metadata about scientific research, but it is also a stern reminder of the sensitiveness of results to the methods used.

Metadata, Legal and Rights

This document, as well as any data and map included herein, are without prejudice to the status of or sovereignty over any territory, to the delimitation of international frontiers and boundaries and to the name of any territory, city or area. Extracts from publications may be subject to additional disclaimers, which are set out in the complete version of the publication, available at the link provided.

© OECD 2020

The use of this work, whether digital or print, is governed by the Terms and Conditions to be found at