Quantifying the “cognitive extent” of science and how it has changed over time and across countries

S. Milojević
Indiana University
United States

This essay presents and discusses recent trends in the growth of science in terms of the extent of knowledge in scientific literature rather than from the standpoint of publication volumes. It applies an existing method for evaluating the cognitive content of scientific publications to the entire Web of Science database for the period 1900-2020. It compares the growth dynamics of science based on productivity measures with those based on a concept of “cognitive territory”, finding stagnation in the latter since the mid-2000s. Growth dynamics are also examined for individual fields of research, showing that physics, astronomy and biology are expanding, whereas medicine is stagnating or even contracting. Cognitive extent is compared for different countries. While the People’s Republic of China (hereafter “China”) was the biggest producer of scientific publications in 2019, its papers covered a smaller cognitive extent than many individual West European countries and Japan.

Is science in decline? This seemingly simple question has so far eluded a definitive answer because it depends on how science is measured. The development of science indicators and metrics dates back to the 1962 publication of the so-called Frascati Manual, now in its sixth edition (2015), by the Organisation for European Economic Co-operation (later OECD).

These pioneering efforts enabled assessment of and better informed science policies. However, they also emphasised economic input-output measures stripped of any context rather than the content of science and the context of its creation. Thus, the common measures of scientific progress mostly focus on volume, capturing either the number of papers or the number of researchers.

The focus on the number of journal articles assumes that individual papers represent “a unit” of knowledge, and that all papers contribute equally to the advancement of science. To address the issue of an unequal contribution of individual papers (and/or authors), the focus on volume was complemented by impact measures, most commonly defined in terms of the number of citations received. However, neither volume nor impact is a good measure of the breadth or extent of knowledge produced. To assess the extent of knowledge, the focus needs to shift to the content of publications as exhibited in their text.

The use of text in the quantitative study of science has a rich history, especially in mapping the structure of scientific fields. Milojević (2015) quantifies the cognitive extent of scientific fields by using information contained in the titles of journal articles. The method is statistical and uses natural language processing to extract phrases from titles in English. The phrases are combinations of words that describe specific concepts, such as the methods of study, scientific instruments, and objects of study and their properties. For example, “scanning tunnelling microscopy” would be identified as a phrase, as would “high-temperature superconducting”. General words, such as “study” or “observation”, would not be identified as a phrase.

Cognitive extent might be easiest to envision as a measure, at any point in time, of the intellectual territory covered by science or its different subunits (e.g. broad research areas, specialisations, journals, countries). For the measure to be unbiased by the volume of output, and to facilitate comparisons across articles with titles of unequal lengths, the cognitive extent is calculated using samples that always contain the same number of title phrases (10 000, corresponding to about 3 000 articles).

Once the phrases in the titles are identified, the number of unique phrases is counted. A smaller number of unique phrases among 10 000 title phrases (e.g. 5 000) would indicate a lot of repetition in the given volume of literature. One could say this body of literature covers a smaller cognitive territory. On the other hand, a large number of unique phrases (e.g. 8 000) means the body of literature examined covers a wider range of concepts and a larger cognitive territory.

Depending on the object of study (the entirety of scientific output, a specific field of science, etc.), the batch of 10 000 title phrases comes from either one of two sources. First, it could come from around 3 000 randomly selected articles from any field, as long as the articles are all published in the same period (typically the same year). Second, it might come only from articles in some predefined field, again so long as they are all published in the same period.

The cognitive extent measure at any time is static (not cumulative) and therefore not influenced by previous states – each measurement is independent. Although many of the specific phrases change from one batch to another, the resulting degree of diversity is remarkably stable from one year to the next. It is true that not all phrases are equally useful or relevant. However, this is not a significant limitation of the method. The measure is applied in a relative sense: comparing one country to another, or one period to another. In each body of literature to which the measure is applied there will be a distribution of phrases across the spectrum of relevance.

Milojević (2015) applied the above measure to follow developments in three research areas (physics, astronomy and biomedicine). It found that while the number of papers grew exponentially in all three fields, between 1900 (1945 for biomedicine) and 2010, their cognitive extent grew linearly. Furthermore, the measure was applied to literature produced by teams of differing sizes, where an inverse relationship was found between cognitive extent and team size. This finding suggests that small teams play a particularly important role in expanding the cognitive territory.

For the purposes of the OECD work on artificial intelligence and the productivity of science, this method was applied to the entire Web of Science database covering 1900 to 2020. It encompasses the literature for science as a whole, as well as individual research fields. It also examined the cognitive extent of research produced in different countries. Driving these analyses was the question: “Is science (still) expanding, and if so how fast?”

Several studies have shown the number of published scientific papers is doubling every 9 to 15 years (Larsen and von Ins, 2010; Bornmann and Mutz, 2015). Using more recent data, Figure 1 shows the growth of scientific output as indexed in the Web of Science from 1900 to 2020. Interestingly, although scientific output grew approximately exponentially, the rate of exponential growth varied. The fastest growth (with a doubling time of nine years) was in the immediate aftermath of the Second World War until the mid-1970s. Since then, scientific output has doubled every 15 years.

While the volume of scientific output has grown exponentially, Figure 2 shows that the cognitive territory of science has only grown linearly (Milojević, 2015; Fortunato et al., 2018). Recall that cognitive extent, the vertical axis on the figure, is expressed as the number of unique phrases among 10 000 article title phrases. Figure 2 is based on the same articles used for Figure 1 but paints a different picture. The fastest expansion of cognitive extent occurred immediately after the Second World War (1945-51) and then after the Soviet Union launched Sputnik (1958-65). In addition, the continuing increase in literature production (Figure 1) is accompanied by a slowing, and in some years stagnation, in the cognitive extent of science. This starts in 2004 and continues until the time series ends in 2020.

It is well known that different scientific fields develop at a different pace. The work described here shows that not all fields have stagnated. Physics, astronomy and biology are expanding in cognitive extent, whereas mathematics, social sciences, computer science and psychology show slower expansion. Earth sciences, chemistry, agriculture and engineering appear to be stagnating. Meanwhile, medicine has even experienced a contraction since around 2009. In general, basic sciences are expanding, while applied sciences are not.

After the launch of Sputnik in 1957, strategic science policies and increased investment in science assumed unprecedented importance in many national policies. It was what Johnson (1972) described as a kind of science Olympics. Figure 3 shows, for 2019, a ranked list of countries with respect both to scientific output and cognitive extent. China has become the major producer of scientific literature globally, overtaking the United States. However, China’s scientific output still covers a smaller cognitive territory than countries with a longer tradition of modern science, such as France, Germany and the United States. (Reminder: for all countries to calculate the cognitive extent, this essay used batches of publications of the same size – 10 000 phrases or roughly 3 000 articles.)

Figure 4 shows the cognitive extent of broad fields of science in different countries, in this instance, the United States, the first 15 members of the European Union and China. It focuses on EU-15 rather than the current EU members to consider countries that have traditionally contributed most to overall scientific input. Figure 4 suggests these countries appear to follow different strategies in terms of how broadly they cover different research areas (Figure 4). In terms of cognitive extent, China is approaching the United States and the EU-15 countries in physics and engineering, followed by computer science and medicine. However, China lags significantly in psychology, agriculture and social sciences. It is beyond the scope of this essay to speculate on why this may be the case. However, as one possible factor, the literature in these fields is primarily published in national languages, and therefore not included in this study.

Cognitive extent is an interesting and important measure to add to other measures of science. As suggested in Milojević et al. (2017), “… rather than thinking of individual publications as accreting into an ever-greater understanding of the natural world, it may be better to think of science as building a ladder to the sky. Some publications add new rungs to the ladder, while others primarily increase the width of the ladder.”

Following this insight, cognitive extent can be viewed as a measure of the productivity of science not based on a count of publications (however weighted) but rather as an indicator of the pace at which new rungs are added to the ladder. Increasing the width of the ladder may also be important to ascend, but the two still need to be distinguished.

When combined with other measures, cognitive extent can shed new light on the dynamics of science as a whole, as well as its individual fields. For example, we might not expect rapid advances in fields that cover a large territory or extent but have a small number of researchers working in this domain. Conversely, fields with a relatively small territory or extent and many researchers would be more likely to have a clearly defined research frontier. This helps them to form consensus and resolve open questions quickly.

Some results here indicate that science as a whole may be stagnating. This is occurring, to some extent, in terms of the expansion of frontiers of knowledge rather than in terms of sheer output. Some areas may be in decline. The decline may indicate areas with increased concentration on existing problems using current approaches rather than the introduction of novelty. Such concentration might lead to faster short-term solutions for existing problems, due to intensified effort. However, it might also make future progress slower, if new ideas that might serve to seed progress appear more slowly. More research that includes both qualitative and quantitative approaches will be needed to give a definitive answer to these questions.


Bornmann, L. and R. Mutz (2015), “Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references”, Journal of the Association for Information Science and Technology, Vol. 66/11, pp. 2215-2222, https://doi.org/10.1002/asi.23329.

Fortunato, S. et al. (2018), “Science of science”, Science, Vol. 359/6379, p. eaao0185, https://doi.org/10.1126/science.aao0185.

Johnson, H.G. (1972), “Some economic aspects of science”, Minerva, Vol. 10/1, pp. 10-18, www.jstor.org/stable/41822128.

Larsen, P.O. and M. von Ins (2010), “The rate of growth in scientific publication and the decline in coverage provided by Science Citation Index”, Scientometrics, Vol. 84/3, pp. 575-603, https://doi.org/10.1007/s11192-010-0202-z.

Milojević, S. (2015), “Quantifying the cognitive extent of science”, Journal of Informetrics, Vol. 9/4, pp. 962-973, https://doi.org/10.1016/j.joi.2015.10.005.

Milojević, S. et al. (2017), “Team composition and the pace of science: An ecological perspective”, presented at the LEI-BRICK Workshop, Organization, Economics and Policy of Scientific Research, Turin.

Metadata, Legal and Rights

This document, as well as any data and map included herein, are without prejudice to the status of or sovereignty over any territory, to the delimitation of international frontiers and boundaries and to the name of any territory, city or area. Extracts from publications may be subject to additional disclaimers, which are set out in the complete version of the publication, available at the link provided.

© OECD 2023

The use of this work, whether digital or print, is governed by the Terms and Conditions to be found at https://www.oecd.org/termsandconditions.