5.2. Science and digitalisation

Advances in scientific knowledge are key to developing new digital technologies. Over the last decade, China almost trebled its contribution to computer science journals, overtaking the United States in the production of scientific documents in this field. However, the share of documents that are in the world’s top-cited (top 10% normalised by type of document and field) is still close to 7%, less than the world average and well below the United States at 17%. The rate of computer science publications from China which are highly cited has nonetheless more than doubled since 2006, making China the second-largest producer worldwide. In some countries, such as Italy, Israel, Luxembourg and Poland, the production of scientific research in the field of computer science carries a much higher relative citation rate compared to overall scientific production within those countries. Nearly 20% of computer science publications by Switzerland-based authors feature among the world top-10% cited scientific documents. This figure reaches 25% for Luxembourg although with a much smaller level of scientific production.

Scientific activity makes intensive use of digital tools and generates digital assets in the form of new data and software. A new 2018 OECD pilot survey, the International Survey of Scientific Authors (ISSA), focuses on measuring the digitalisation of science. Preliminary findings show that, on average, 60% or more of scientific publications generate new data and new software codes. Countries with higher levels of R&D intensity are, on average, also more likely to report high shares of scientific production that generate new computer code, either alone or in combination with new data. More than 45% of survey respondents resident in Korea reported developing new code, mostly in combination with data, compared to 20% in Mexico. Data generation is more widespread and evenly distributed. In computer science and decision sciences, more than 50% of respondents generate code, closely followed by physics and astronomy. Code generation is least common in the arts and humanities, and in chemistry, at less than 10% of respondents.

Scientific research represents an important foundation for technological advancement and innovation. By identifying non-patent literature, in particular scientific articles, cited in patent documents, it is possible to gain insights into linkages between scientific progress and new inventions. Digital technologies build mostly on digital-related science, with electrical or information engineering articles cited in 37% of digital patents and computer and information sciences articles cited in 20%. However, digital technologies can be applied in a wide range of fields and therefore, digital patented technologies also draw on scientific production from a broad variety of other areas, especially the physical sciences (12%) and various medical domains, in addition to art, languages and others.

Did You Know?

The United States accounted for around 70% more top-cited scientific publications on computer science than China in 2016. This gap has shrunk from nearly 500% in 2006.


Computer science publications consist of citeable documents (articles, conference proceedings and reviews) featured in journals specialising in this field. “Top-cited publications” are the 10% most-cited papers normalised by scientific field and type of document (OECD and SCImago Research Group, 2016).

Research data include numerical scores, textual records, images and sounds that can be used as primary sources for scientific research. Code includes custom-developed software and code, laboratory notebooks and other computer-enabled documents describing every step of the research work and protocols followed.

Digital (ICT) patent families are identified using the list of IPC codes in Inaba and Squicciarini (2017).


Identifying the digital-related content of research outputs is a major challenge. Bibliographic indices provide a readily available source of data for illustrative purposes, though with interpretability and coverage limitations. Using publishers’ journal classifications would lead to understatement of the digital intensity of science due to the pervasiveness of digital research. Alternatives are scanning publications for content or directly contacting authors. The OECD ISSA 2018 survey does the latter approach in order to gather insights on the use of digital tools and the contribution of science to the digitalisation process (see page 5.6). It should be noted, however, that not all so-called “data scientists” publish in scholarly journals, which form the basis for identifying and contacting authors.

Published patent documents contain references to prior art on which inventions rely, including previous patents and non-patent literature (NPL). Analysing the link between patents and scientific literature cited in patent documents helps to uncover the links between science and innovation. The Max Planck Digital Library has developed robust methods to link NPL with scientific reference data (see Knaus and Palzenberger, 2018). This analysis is based on data elaborated by the Max Planck Institute for Innovation and Competition using information provided in the Clarivate Web of Science (see Poege et al., 2018).

Top 10% most-cited documents in computer science by country, 2016
As a percentage of documents in the top 10% ranked documents, by field, fractional counts

Source: OECD calculations based on Scopus Custom Data, Elsevier, Version 1.2018; and 2018 Scimago Journal Rank from the Scopus journal title list (accessed March 2018), January 2019. See 1. StatLink contains more data.

1. “Top-cited publications” are the 10% most-cited papers normalised by scientific field and type of document (articles, reviews and conference proceedings). The Scimago Journal Rank indicator is used to rank documents with identical numbers of citations within each class. This measure is a proxy indicator of research excellence. Estimates are based on fractional counts of documents by authors affiliated to institutions in each economy. Documents published in multi-disciplinary/generic journals are allocated on a fractional basis to the ASJC codes of citing and cited papers.

The field Computer Science comprises the following sub-fields: Artificial Intelligence, Computational Theory and Mathematics, Computer Graphics and Computer-Aided Design, Computer Networks and Communications, Computer Science Applications, Computer Vision and Pattern Recognition, Hardware and Architecture, Human-Computer Interaction, Information Systems, Signal Processing, and Software.

 StatLink https://doi.org/10.1787/888933930231

Scientific production resulting in new data or code, by country of residence, 2017
As a percentage of responses to the ISSA 2018 survey

Source: OECD, International Survey of Scientific Authors (ISSA) 2018, preliminary results, http://oe.cd/issa, December 2018. See 1. StatLink contains more data.

1. This is an experimental indicator. It is not necessarily representative of the researcher population in each country. Only countries with at least 75 responses have been reported.

 StatLink https://doi.org/10.1787/888933930250

Scientific knowledge embedded in digital patents, by scientific fields, 2003-06 and 2013-16
Distribution of top 20 fields of scientific articles cited by IP5 patent families in ICT

Source: OECD calculations based on data elaboration courtesy of the Max Planck Institute for Innovation and Competition, and OECD, STI Micro-data Lab: Intellectual Property Database, http://oe.cd/ipstats, December 2018. See 1. StatLink contains more data.

1. Data refer to IP5 patent families in ICT-related technologies that cite scientific publications, by filing date and scientific fields using fractional counts. Patents in ICT are identified using the list of IPC codes in Inaba and Squicciarini (2017). Scientific fields are derived from data elaborated and consolidated by the Max Planck Institute for Innovation and Competition, based on linked non-patent literature citations to scientific article data (see Poege et al., 2018). Scientific fields are aggregated to fields of R&D as provided in the OECD Frascati Manual (2015). Data for 2013-16 are incomplete.

 StatLink https://doi.org/10.1787/888933930269

End of the section – Back to iLibrary publication page