Chapter 5. Artificial intelligence and machine learning in science

Ross D. King
Stephen Roberts

Finding solutions to many of the world’s major challenges requires increasing scientific knowledge. Artificial intelligence (AI) has the potential to increase the productivity of science, at a time when some evidence suggests that research productivity may be falling. This chapter first outlines the three key technological developments driving the recent rise in AI: vastly improved computer hardware, vastly increased availability of data and vastly improved AI software. It then describes the promises of AI in science, illustrating its current uses across a range of scientific disciplines. Later sections raise the question of explainability of AI and the implications for science, highlighting gaps in education and training programmes that slow down the rollout of AI in science. The chapter finishes by envisioning a future in which increasingly intelligent AI systems, working with human scientists, help address society’s most pressing problems, while expanding scientific knowledge.



The world faces many global challenges, from climate change to antibiotic bacterial resistance. Solutions to many – if not all – of these challenges require augmented scientific knowledge. Until quite recently, the role of artificial intelligence (AI) in science received little attention. In the words of Glymour (2004), “despite a lack of public fanfare, there is mounting evidence that we are in the midst of... a revolution – premised on the automation of scientific discovery”. Today, AI is regularly the subject of published reports in the most prestigious scientific journals, such as Science and Nature.

Nevertheless, the scientific community has a poor general understanding of AI. As with many new technologies, opinions polarise towards extremes, from “AI will revolutionise everything” to “AI will have no real impact”. The truth, of course, is somewhere in the middle; what is unclear is how close it is to either of the poles. Answering this question is made more complex by the complicated history of AI (Boden, 2006): since its inception in the 1950s, AI has gone through several cycles of enthusiasm and disillusionment.

What differentiates the current situation from previous AI “hype-cycles” is that the underlying computer technology has improved, there are vastly more data, AI is better understood, and – perhaps most importantly as a point of historical difference – the amount of corporate money being invested has increased, and large profits are being made from using AI. Some of the largest companies in the world (e.g. Google, Amazon, Facebook, Tencent, Baidu and Alibaba) have focused their businesses on AI. Taken together, these developments mean that AI will very likely have a huge and growing impact on the world.

As described in Box 5.1, AI has the potential to increase the productivity of science, at a time where evidence suggests research productivity may be falling and new ideas are harder to find (Bloom et al. 2017; Jones, 2005). The use of AI in science could also enable novel forms of discovery, enhance reproducibility and even wield philosophical implications on the scientific process. Three key technological developments are driving the recent rise of AI: vastly improved computer hardware, vastly increased data availability and vastly improved AI software. Several additional factors are also enabling AI in science: AI is well funded, at least in the commercial sector; scientific data are increasingly abundant; high-performance computing is improving; and scientists now have access to open-source AI code. Multiple examples show AI being used across the entire span of scientific enquiry. Furthermore, AI is being applied to all phases of the scientific process, including optimising experimental design.

Box 5.1. What is AI?

AI is the discipline of creating algorithms (computer software) that can learn and reason about tasks that would be considered “intelligent” if performed by a human or animal. “Narrow” AI is the development of solutions to specific tasks that require intelligence, e.g. beating the world’s chess or Go champion, driving a car or making a medical diagnosis. “Full” – or general – AI is the development of a system that has equal or greater intelligence to an adult human. It is generally believed that full AI is decades away; hence, this chapter focuses on narrow AI. As AI algorithms focus on the generic ability to learn, rather than solve any particular problem, they are very widely applicable.

At least one current obstacle to achieving the full potential of AI in science is economic. Computational resources, which are essential to leading-edge research in AI, can be extremely expensive. The largest computing resources – and the longest employee lists of excellent AI researchers – are frequently found not in universities or the public sector, but in the private sector. Private-sector work mainly focuses on generating profits, rather than solving outstanding scientific questions. A key policy issue concerns education and training in AI and machine learning (ML). Too few students are trained to understand the fundamental role of logic in AI; most data analysis taught to non-specialists in universities is still based on the classical statistics developed in the early 20th century.

This chapter outlines the technologies driving the recent rise in AI. It describes the promises of AI in science, illustrating its current uses across a range of scientific disciplines. Later sections raise the question of explainability of AI and the implications for science, highlighting gaps in education and training programmes that slow down the rollout of AI in science. The chapter finishes with a vision of AI and the future of science.

Technological drivers are behind the recent rise of AI

Three technological drivers are behind the recent rise of AI:

  • Faster computers: the modern computer age has been shaped by the exponential increase in computer speeds, in line with “Moore’s Law”. This means that the supercomputing power needed to beat the world champion (Gary Kasparov) at chess for the first time in 1996 can now fit in a standard mobile phone. To keep up with demand for ever-greater computing power, manufacturers have created a wealth of innovations over the past decades, from multithreading multicore central processing units to large-scale graphics processing units. AI partly owes its recent achievements to the pace of computing advances, allowing AI algorithms to explore complex solutions to large-scale problems. Indeed, some of the most publicised achievements of modern AI, such as playing the game of Go better than any human expert, would not have been possible without vast high-speed computing resources.

  • The scale of data: with the advent of cheaper sensors, telemetry equipment, ultra-fast computing and cheap data storage at scale, science has undergone a paradigm shift. In a collection of essays published as The Fourth Paradigm, Hey et al. (2009) argue that experimental science has undergone a fundamental change. The era of direct experimentation is gone, replaced by the era of data collection. Rather than perform science directly, experiments are designed to record and archive data at an unprecedented scale. Science, namely the evidence-based audit trail of the reasoning of discovery, then takes place within the data. In this sense, much of traditional science has become data science. For most of human history, scientists have observed the universe and the natural world, postulating laws or principles to help generalise the complexity of observations into simpler concepts. Deriving such generalisations from data is akin to finding a hidden structure that is highly explanatory and as such, amenable to intelligent automation.

  • Improved AI software: significant advances in AI software have taken place in recent years, especially in ML, and more particularly the branch of ML known as deep learning (DL) (Box 5.2).

Box 5.2. ML and (deep) neural networks: What are they?

ML normally refers to the branch of AI focused on developing systems that learn from data. Rather than being explicitly told how to solve a problem, ML algorithms can create solutions by learning from examples (referred to as “training” the ML algorithm).

Often, the terms ML and AI are used interchangeably, and their meaning has certainly changed over the last two decades. From a more recent perspective, ML has grown to encompass data-driven approaches, including traditional computational statistics models, e.g. polynomial regression and logistic classification. In modern parlance, the term AI is used to describe “deeper” models, which have the ability to learn (almost) arbitrarily complex mappings from input to outcome. Such models include deep neural networks and Gaussian processes. Strictly speaking, AI is an extension of ML, augmenting models that learn from example with approaches such as expert systems, logical and statistical inference methods, and planning.

(Deep) neural networks

DL and deep neural networks are a type of ML. Recently, DL has transformed the way in which algorithms achieve (or exceed) human-level performance in areas such as game playing and computer vision. DL owes its success to the easy availability of vast amounts of data and vastly more powerful computers, as well as new algorithmic insights. In common with other “non-parametric” methods (such as Bayesian non-parametric models), DL does not specify the functional form of solutions. Instead, it has enough flexible complexity to learn arbitrary mappings, from input to outcome, from many training examples.

Neural networks began in the 1950s, making significant progress in the 1980s and 1990s. Deep models have added complexity, with several “hidden layers” of non-linear functions cascading between input and output. Despite initial investigations of deep neural networks back in the 1990s, high-performance computing of the time did not allow training over large datasets in realistic time periods for well over a decade. It is only more recently that we have seen the truly impressive ability of DL to solve certain classes of problem.

Why AI in science matters

AI systems are now capable of superhuman reasoning. They can accurately remember vast numbers of facts, execute flawless logical reasoning and near-optimal probabilistic reasoning, learn more rationally than humans from small amounts of data and learn from large amounts of data no human could deal with. These abilities give AI the potential to transform science by augmenting human scientific reasoning (Kitano, 2016). ML and AI have the potential to contribute to science in several key ways: finding unusual and interesting patterns in vast datasets; discovering scientific principles, invariance and laws from data; augmenting human science; and combining with robotic systems to yield “robot scientists”. The following paragraphs describe key contributions in more detail.

AI might enable novel types of discovery

One motivation for investing in AI for science is that AI systems “think differently”. Human scientists – at least all modern ones – are educated and trained in basically the same way; this is likely to impose unrecognised cognitive biases in how they approach scientific problems. AI systems have very different strengths and weaknesses than human scientists. The expectation is that combining both ways of thinking will provide synergies. Indeed, the evidence from human-software symbiosis has shown that the fusion of automated and human exploration of complex systems can yield efficient and effective solution discovery (Kasparov, 2017).

AI in science may become essential in a context where the volume of scientific papers is vast and growing, and scientists may have reached “peak reading”

AI systems and human scientists have complementary reading skills. Human scientists can understand papers in detail (although such understanding is limited by the ambiguities inherent in natural languages), but can only read and remember a limited number of papers. By contrast, AI systems can extract information from millions of scientific papers, but the amount of detail that can be abstracted is severely limited (Manning and Schütze, 1999).

Applying AI in science has philosophical implications, e.g. in terms of better understanding the scientific process

Automating science also has major philosophical implications. If an AI-based mechanism can be built that is judged to have discovered some novel scientific knowledge, then this will shed light on the nature of science (King et al., 2018). To quote Richard Feynman “What I cannot create, I do not understand” (written on his blackboard at the time of his death). Building robot scientists, for example, entails the need to make concrete engineering decisions related to several important problems in the philosophy of science. For instance, is it more effective to reason only with observed quantities, or to also involve unobserved theoretical concepts? This engineering-based approach to understanding science – shedding light on the discovery process by attempting to replicate it through machine processes – is analogous to the AI approach to understanding the human mind through the creation of artefacts (such as machine learning systems using artificial neural networks) that can be empirically shown to possess some of its attributes. Making machines that physically implement different philosophies of science enables empirical comparison of these philosophies. Currently, philosophers of science are generally limited to historical analysis.

AI can combine with robotic systems to execute closed-loop scientific research

Figure 5.1. Hypothesis-driven closed-loop learning
How iterative cycles of hypothesis-driven experimentation allow for the autonomous generation of new scientific knowledge
Figure 5.1. Hypothesis-driven closed-loop learning

The convergence of AI and robotics has many potential benefits for science. It is possible to physically implement a laboratory-automation system that exploits techniques from the AI field to execute cycles of scientific experimentation. The execution of cycles of scientific research is a general approach applicable in many fields of science. Fully automating science has several potential advantages:

  • Faster scientific discovery. Automated systems can generate and test thousands of hypotheses in parallel, utilising experiments that test multiple hypotheses. Human beings’ cognitive limitations mean they can only consider a few hypotheses at a time (King et al., 2004; King et al., 2009).

  • Cheaper experimentation. AI systems can select experiments utilising greater economic rationality (Williams et al., 2015). The power of AI offers very efficient exploration and exploitation of unknown experimental landscapes, and leads the development of novel drugs (Griffiths and Hernandez-Lobato, 2017; Segler et al., 2018), materials (Frazier and Wang, 2015; Butler et al., 2018) and devices (Kim et al., 2017).

  • Easier training. Including initial education, a human scientist requires over 20 years and huge resources to be fully trained. Humans can only absorb knowledge slowly through teaching and experience. Robots, by contrast, can directly absorb knowledge from each other.

  • Increased and more productive work. Robots can work longer and harder than humans, and do not require rest or holidays.

  • Improved knowledge/data sharing and scientific reproducibility. One of the most important current issues in biology – and other scientific fields – is reproducibility. A 2016 edition of Nature observed that: “There is growing alarm about results that cannot be reproduced. Explanations include increased levels of scrutiny, complexity of experiments and statistics, and pressures on researchers” (Alexander et al., 2018). Robots have the superhuman ability to record experimental actions and results. These results, along with the associated metadata and employed procedures, are automatically recorded in full and in accordance with accepted standards, at no additional cost. By contrast, recording data, metadata and procedures adds up to 15% to the total costs of experimentation by humans. Moreover, despite the widespread recording of experimental data, it is still uncommon to fully document the procedures used, the errors made and all the metadata.

Laboratory automation is now essential to most areas of science and technology, but is expensive and difficult to use. The high expense stems from the low number of units sold and the market’s immaturity. Consequently, laboratory automation is currently used most economically in large central sites, and companies and universities are increasingly concentrating their laboratory automation. The most advanced example of this trend is cloud automation, where a very large amount of equipment is gathered in a single site, where biologists send their samples and use an application programming interface to design their experiments.

Human-AI interaction

Little research has been done on working scientists’ attitude to AI, or the sociological and anthropological issues involved in human scientists and AI systems working together in the future. Compared to humans, AI systems possess a mixture of super- and sub-human abilities. Computers and laboratory robots have traditionally been used to automate low-level repetitive tasks, because they have the super-human capacity to work near flawlessly on extremely repetitive tasks for days at a time. In comparison, humans perform badly at repetitive tasks, especially during extended periods. However, AI systems are sub-human in their adaptability and understanding, and human scientists are still unequalled in conditions that require flexibility and dealing with unexpected situations; they are especially endowed with intuitive functions that might otherwise have been considered low level (King et al., 2018). Given AI systems’ mixture of super- and sub-human abilities, investigating how human scientists co-operate with their AI counterparts can be informative. These relationships occur at many levels, from the most profound (deciding on what to investigate, structuring a problem for computational analysis, interpreting unusual experimental results, etc.) to the most mundane (cleaning, replacing consumables, etc.). The growing use of AI systems in science is also expected to profoundly change some sociological aspects of science, such as knowledge transmission, crediting systems for scientific discoveries and perhaps even the peer-review system.1 Most of the current methods for establishing scientific authority (peer-review, conference plenaries, etc.) are inherently social and designed for human scientists. If AI systems become common in science, such established knowledge-making institutions might have to change to ensure continued academic credibility (King, 2018).

AI across scientific domains

In many scientific disciplines, the ability to record data cheaply, efficiently and rapidly allows the experiments themselves to become sophisticated data-acquisition exercises. Science – the construction of deep understanding from observations of the surrounding world – can then be performed within the data. For many years, this has meant that teams of scientists, augmented by computers, have been able to extract meaning from data, building an intimate bridge between science and data science. More recently, the sheer size, dimensionality and rate of production of scientific data have become so vast that reliance on automation and intelligent systems has become prevalent. Algorithms can scour data at scales beyond human capacity, finding interesting new phenomena and contributing to the discovery process. Box 5.3 shows examples of AI applications in several research fields.

Box 5.3. Applications of AI in different scientific fields Type

AI is increasingly being applied across the span of science, as shown in the examples below.

Physical sciences

Recent work at the forefront of large-scale intelligent data analysis has had massive impact in the physical sciences, particularly in the particle and astrophysics communities, in which event discovery within the data is essential. Such approaches lie, for example, at the core of the detection of pulsars (van Heerden et al., 2016), exoplanets (Rajpaul et al., 2015), gravitational waves (George and Huerta, 2018) and particle physics (Alexander et al., 2018). ML (typically Bayesian) approaches have been widely adopted, not only for purposes of detection, but also to ascertain and remove underlying (and unknown) systematic corruptions and artefacts from large physical-science datasets (Aigrain et al., 2017). They have also been applied to more mainstream regression and classification methods, such as the photometric redshift estimation requirements of the European Space Agency’s Euclid mission2 (Almosallam et al., 2015). Furthermore, a significant body of literature considers whether techniques such as deep neural networks can be as valuable to the physical sciences as they have proven in such areas as speech and language understanding. Although complex DL systems play less of a role at present, they will most likely increase their part in extracting insight from data in the coming years.

These illustrations highlight a deep connection between the physical sciences and the field known today as data science, which draws heavily on statistics, mathematics and computer science. A symbiotic relationship exists between data and the physical sciences, with each field offering both theoretical developments and practical applications that can benefit the other, typically evolving through an interactive feedback loop. With the forthcoming emergence of larger and more complex datasets in the physical sciences, this symbiotic relationship is set to grow considerably in the near future.


One of the most prominent applications of AI to chemistry is the planning of organic synthesis pathways. Significant progress has recently been made in this field, both by using the traditional approach of encoding expert chemist knowledge into rules (Klucznik et al., 2018) and by using ML (Segler et al., 2018).

Another active application is drug design (Schneider, 2017). A key step in drug design is learning about quantitative-structure activity relationships (QSARs). The standard QSAR learning problem is: “given a target (usually a protein) and a set of chemical compounds (small molecules) with associated bioactivities (e.g. inhibition of the target), learn a predictive mapping from molecular representation to activity”. Almost every type of ML method has been applied to QSAR learning (although no single method has been found superior).

AI is increasingly being integrated with laboratory robotics in drug design to fully automate cycles of research. In 2018, the United Kingdom announced a new facility at the Rosalind Franklin Institute, aiming to transform the UK pharmaceutical industry by pioneering fully automated molecular discovery to produce new drugs up to ten times faster. Similar initiatives are under way in industry, for example at AstraZeneca’s new facility in Cambridge, England.


Probably the most famous AI company in the world is the London-based DeepMind, thanks to its development of AlphaGo, which now beats the best humans at the game of Go, and AlphaGo Zero. DeepMind is actively seeking to deploy its ML technology (DL, reinforcement learning) to medical problems for the UK National Health Service, mostly focusing on image analysis. However, privacy concerns have arisen over the use of health-related data by DeepMind, which is part of the Google suite of companies (Wakefield, 2017).

Related to DeepMind’s image processing is the impressive DL method of diagnosing skin cancer using mobile-phone photos (Esteva et al., 2017). Despite the demonstrated success of applying AI to diagnoses, based on image analysis, such applications barely scratch the surface of the potential of AI in cancer diagnosis and treatment.

Many examples of vast-scale algorithmic science projects exist in the physical sciences. The Square Kilometre Array, a radio telescope network currently under construction in Australia and South Africa, will generate more data than the entire global Internet traffic per day when it goes on line. Indeed, the project is already streaming data at almost one terabyte per second. The Large Hadron Collider at CERN, the European Organization for Nuclear Research, discovered the elusive Higgs boson in data streams produced at a rate of gigabytes per second. Meteorologists and seismologists routinely work with global sensor networks that are heterogeneous with regard to their spatial distribution, as well as the type, quantity and quality of data produced. In such settings, problems are not confined to the volumes of data now produced. The signal-to-noise ratio also matters: signals may only provide biased estimates of desired quantities; furthermore, incomplete data complicate or hinder the extraction of automated meaning from data. Data rate alone is hence not the core problem. Data cleaning and curation are of equal importance.

Using AI to select experiments

Addressing the issue of which data and algorithm to employ leads to the issue of intelligently selecting experiments, both to acquire new data and to shed new light on old data. Both these processes can be – and often are – automated. The concept of optimal experimental design may be old, but modern equivalents bring smart statistical models to enable each data run and algorithm choice to maximise the informativeness gained. Moreover, this optimisation process can consider the costs associated with data recording and computation, enabling efficient and optimal experimentation within a given budget.

In standard ML, the learning algorithm is given all the examples at the start. Active learning is the branch of ML where the learning algorithm is designed to select examples from which to learn; this is a more efficient form of learning. There exists a close analogy between active learning and the process scientists use to select experiments. Active learning proceeds by using existing knowledge to propose where most knowledge will be obtained from a future measurement; the measurement is then taken at this location. Scientific experimental design follows a similar process, with future experiments selected to plug gaps in existing knowledge or test existing theories. Experimental results then help form a better understanding, and so the process repeats. Indeed, scientists do not typically wait patiently and form theories from what they observe; rather, they actively conduct experiments to test hypotheses. Work in active learning (King et al., 2004; Williams et al., 2015) offers an efficient method for balancing the cost of experimentation with the rewards of discovery.

Active learning is a special case of a more generic methodology, Bayesian optimisation and optimal experimental design (Lindley, 1956), which provides an elegant framework for optimally balancing exploration and exploitation in the presence of uncertainty. Bayesian optimisation is at the core of modern approaches. The incorporation of probability theory into experimental design allows algorithms not just to decide where knowledge might be maximised, but also to reduce the uncertainty associated with regions of “experiment space” that are sparsely populated with results. This enables Bayesian experimental approaches not just to “exploit” areas of valuable results, but also to explore hitherto un-investigated experiments.

Explainability: What does it imply in the context of science?

Inscrutability in ML decision-making is commonly cited in discussions of AI as a source of possible concern. The Defense Advanced Research Projects Agency, in the United States, is funding 13 different research groups, working on a range of approaches to make AI more explainable. However, a problem of inscrutability exists in some areas of science – particularly mathematics – independently of the role of machines. Andrew Wiles’ proof of Fermat’s last theorem ran to over 100 pages and took many mathematicians many years to verify. Will this problem of inscrutability become more salient in science as AI becomes more widespread?

One of the core goals of science is to increase knowledge of the natural world through the performance of experiments. This knowledge should be expressed in formal logical languages. Formal languages promote semantic clarity, which in turn supports the free exchange of scientific knowledge and simplifies scientific reasoning. The use of AI systems allows formalising in logic all aspects of a scientific investigation.

AI can, in fact, be used to help formalise scientific argumentation involving many research units (segments of experimental research) and research steps. Making experimental structures explicit renders scientific research more comprehensible, reproducible and reusable.

A major motivation for formalising experimental knowledge is that it can be reused more easily to answer other scientific questions. Many modern AI and ML models can be used to infer the importance of observations, measurements and data features. This insight is often more valuable to scientists than the outcome variables from the models. Techniques such as local interpretable model-agnostic explanations (LIME), for example, offer a good way of explaining the predictions of ML classifiers. LIME can examine “what matters” in the data, by selectively perturbing input data and seeing how the predictions change. Even with the use of DL techniques, if a scientist needs complete audit trails then excellent approaches exist, for example based upon boosted decision trees (a method using multiple decision trees that are additive, rather than averaged).

A key policy concern: Gaps in education and training

A key policy issue concerns education and training. Modifications of the education system often take place at a much slower pace than many other societal changes. Many subjects still taught to children seem more appropriate to the 19th century than the 21st. Three main traditional subjects underlie an understanding of AI: logic, data analysis (statistics) and computer science. Despite being fundamental to reasoning and having a 2 400-year history, logic is currently not taught in schools in most countries, and is almost not taught at all in universities, outside of specialised courses in computer science and philosophy. This means that few students are trained to understand the fundamental role of logic in AI.3

The analysis of data is as fundamental a subject as logic, but is also little taught in schools. Most data analysis currently taught to non-specialists in universities is still based on the classical statistics developed in the early 20th century. It deals with such topics as hypothesis testing, confidence intervals and simple optimisation methods – the forms of data analysis also most often reported in scientific papers. However, this type of data analysis presents philosophical and technical problems (Jaynes, 2003).

An even greater problem is that data analysis is taught in a way that resembles more cooking than science: in the presence of data in a form that looks like X, then a t-test should be applied at a 5% one-tail confidence level; if the data are in form Y, then an F-test should be applied at a 1% two-tail confidence level, etc. Unfortunately, such courses convey little understanding of fundamental concepts, meaning that few students understand the fundamentals of data analysis needed for ML. Students should learn about Bayesian statistics and computational intensive methods based on resampling to better understand the reliability of conclusions.

Computer science education has not kept pace with the importance of AI to society. Computer science has also been conflated with “information technology skills” (Royal Society, 2017). Another problem is that in Western countries (as opposed to many developing countries), female students are far outnumbered by male students. It would be very worrisome if this low share were to transfer to the applications of AI in science (Chapter 7).

A general skill shortage also exists in AI. This creates a need for master’s conversion courses to transform graduates in other disciplines into scientists qualified to work at the AI/science interface, as well as more PhD positions at that interface. The independent report “Growing the AI Industry in the UK” (Hall and Pesenti, 2017) articulated how the UK Government and industry can work together to build skills and infrastructure, and implement a long-term strategy for AI, and recommended funding to reach these goals.

A vision of AI and the future of science

Despite the impressive performance of AI in many areas, the need still exists to transfer methods that perform well in constrained, well-structured problem spaces (such as game playing, image analysis, text and language modelling) to noisy, corrupted and partially observed scientific problem domains. The problems DL approaches encounter with small (and noisy) datasets compound this issue. Creating a realistic approach that works across all data scales, from data-sparse environments to data-rich environments, requires yet more innovation (Box 5.4). Probabilistic models do offer such capacities, although Bayesian DL is still in its infancy.

Box 5.4. In my view: Moderating expectations: What deep learning can and cannot do yet

Gary Marcus, New York University

Deep learning currently dominates AI research and its applications, and has generated considerable excitement – perhaps somewhat more than is actually warranted. Although deep learning has made considerable progress in areas such as speech recognition and game playing, and contributed to the use of AI in science, as described in this chapter, it is far from a universal solvent, and by itself is unlikely to yield general intelligence.

To understand its scope and limits, it helps to understand what deep learning does; fundamentally, as it is most often used, it approximates complex relationships by learning to classify input examples into output examples, through a form of successive approximation that uses large quantities of training data. It then tries to extend the classifications it has learned to other sets of input “test” data pertaining to the same problem domains. However, unlike human reasoning, deep learning lacks a mechanism for learning abstractions through explicit, verbal definition. Current systems driven purely by deep learning face a number of limitations:

  • Since deep learning requires large sets of training data, it works less well in problem areas where data are limited.

  • Deep learning techniques can fail if test data differ significantly from training data, as often happens outside a controlled environment. Recent experiments show that deep learning performs poorly when confronted with scenarios that differ in minor ways from those on which the system was trained.

  • Deep learning techniques do not perform well when dealing with data with complex hierarchical structures. Deep learning learns correlations between sets of features that are themselves “flat” or non-hierarchical, as in a simple, unstructured list, but much human and linguistic knowledge is more structured.

  • Current deep learning techniques cannot accurately draw open-ended inferences based on real-world knowledge. When applied to reading, for example, deep learning works well when the answer to a given question is explicitly contained within a text. It works less well in tasks requiring inference beyond what is explicit in a text.

  • The lack of transparency of deep learning makes this technology a potential liability when applied to support decisions in areas such as medical diagnosis in which human users like to understand how a given system made a given decision. The millions or even billions of parameters used by deep learning to solve a problem do not easily allow its results to be reverse-engineered.

  • Thus far, existing deep learning approaches have struggled to integrate prior knowledge, such as the laws of physics. Yet dealing with problems that have less to do with categorisation and more to do with scientific reasoning will require such integration.

  • Relatively little work within the deep learning tradition has attempted to distinguish causation from correlation.

Deep learning should not be abandoned, but general intelligence will require complementary tools – possibly of an entirely different nature that is closer to classical symbolic artificial intelligence – to supplement current techniques.

Although they offer impressive performance, many AI approaches provide little in the way of transparency regarding their function. Auditing the reasoning behind decision-making is required in many application domains. For practical systems, where AI makes decisions about people (for example), such an audit trail is essential. Furthermore, few AI algorithms can offer formal guarantees regarding their performance. In safety-critical environments, the ability to provide such bounds and verify failure modes when faced with unusual data is a prerequisite. Some research in this area is already under way, though not commonplace.

It is to be hoped that the collaboration between human scientists and AI systems will produce better science than can be performed alone. For example, human/computer teams still play better chess than either does alone. Understanding how best to synergise the strengths and weaknesses of human scientists and AI systems requires a better understanding of the issues (not just technical, but also economic, sociological and anthropological) involved in human/machine collaboration.

Arguably, advances in technology and the understanding of science will drive the development of ever-smarter AI systems for science. Hiroaki Kitano, President and CEO of Sony Computer Science Laboratories, has called for new Grand Challenge for AI: to develop an AI system that can make major scientific discoveries in biomedical sciences worthy of a Nobel Prize (Kitano, 2016). This may sound fantastical, but the physics Nobel laureate Frank Wilczek (2006) is on record as saying that in 100 years’ time, the best physicist will be a machine. If this vision of the future comes to pass, this will not only transform technology, but humans’ understanding of the universe (Box 5.5).

Box 5.5. AI and the laws of nature

How can algorithms infer an understanding of the laws of nature? AI algorithms learn solutions from examples. A critical part of these solutions consists in forming a function that generalises, i.e. performs well when presented with data that did not form part of the training examples. This critical generalisation requirement requires AI algorithms to “discover” a problem’s systematic trends and properties that are common across all the examples. This ability to find underlying commonality in complex data also allows models to find simple representations, rules and patterns in scientific data. The “laws” of science are such representations. Examples include the blocked adaptive computationally efficient outlier nominators (BACON) algorithm, which “discovered” Kepler’s laws of planetary motion (Langley et al., 1987).


The laws of science are compressed, elegant representations offering insight into the functioning of the universe. They are ultimately developed through logical (mathematical) formulation and empirical observation. Both avenues have seen revolutions in the application of ML and AI in recent years. AI systems can formulate axiomatic extensions to existing laws. The wealth of data available from experiments allows science to take place in the data. Science is rapidly approaching the point where AI systems can infer such things as conservation laws and laws of motion based on data only, and can propose experiments to gather maximal knowledge from new data. Coupled with these developments, the ability of AI to reason logically and operate at scales well beyond the human scale creates a recipe for a genuine automated scientist.


Aigrain, S. et al. (2017), “Robust, open-source removal of systematics in Kepler data”, Monthly Notices of the Royal Astronomical Society, Vol. 471/1, pp. 759-769, Oxford Academic Press, Oxford,

Alexander, R et al. (2018), “Machine learning at the energy and intensity frontiers of particle physics”, Nature, Vol. 560, pp. 41-48, Springer Nature,

Almosallam, I. et al. (2015), “A Sparse Gaussian Process Framework for Photometric Redshift Estimation”, Monthly Notices of the Royal Astronomical Society, Vol. 455/3, pp. 2387-2401, Oxford Academic Press,

Bloom, N. et al. (2017), “Are Ideas Getting Harder to Find?”, NBER Working Paper, No. 23782, September 2017, National Bureau of Economic Research, Cambridge, MA.

Boden, M.A. (2006), Mind as Machine: A History of Cognitive Science, Oxford University Press, Oxford.

Butler, K.T. et al. (2018), “Machine learning for molecular and materials science”, Nature, Vol. 559, pp. 547-555, Springer Nature.

Esteva, A., B.Kuprel, R.A.Novoa, J.Ko, S.M.Swetter and H.M.Blau (2017), “Dermatologist-level classification of skin cancer with deep neural networks”, Nature volume 542, pages 115–118 (02 February 2017),

Frazier P.I. and J. Wang (2016), “Bayesian Optimization for Materials Design”, in Lookman T., F. Alexander and K. Rajan (eds.), Information Science for Materials Discovery and Design, Springer Series in Materials Science, Vol. 225, Springer Nature Switzerland.

George, D. and E.A. Huerta (2018), “Deep Learning for real-time gravitational wave detection and parameter estimation: Results with Advanced LIGO data”, Physics Letters B, Vol. 778, pp. 64-70,

Glymour, C. (2004), “The Automation of Discovery”, Daedalus, Vol. 133/1, pp. 66-77, MIT Press Journals, Cambridge, MA,

Griffiths, R-R. and J.M. Hernández-Lobato (2017), “Constrained Bayesian Optimization for Automatic Chemical Design”,, Cornell University, Ithaca, NY.

Hall, W. and J. Pesenti (2017), “Growing the AI Industry in the UK”, independent report, Government of the United Kingdom, London,

Hey, T., S. Tansley and K. Tolle (2009), The Fourth Paradigm: Data Intensive Scientific Discovery, Microsoft Research, Redmond, WA.

Jaynes, E.T. (2003), Probability Theory: The Logic of Science, Cambridge University Press, Cambridge.

Jones, B.F. (2005), “The Burden of Knowledge and the 'Death of the Renaissance Man’: Is Innovation Getting Harder?”, NBER Working Paper, No. 11360, Cambridge, MA.

Kasparov, G. (2017), Deep Thinking: Where Machine Intelligence Ends and Human Creativity Begins, John Murray, London.

Kim, M. et al., (2017), “Human-in-the-loop Bayesian optimization of wearable device parameters”, PLoS ONE, Vol. 12/9, PLOS, San Francisco,

King, R.D. (2018), “Tackling AI Impact on Drug Patenting”, Nature, Vol. 560, correspondence, 16 August, p. 307, Springer Nature.

King, R.D. et al. (2018), “Automating science: philosophical and social dimensions”, IEEE Technology and Society Magazine, Vol. 37/1, pp. 40-46, IEEE Society on Social Implications of Technology, New York.

King, R.D. et al. (2009), “The Automation of Science”, Science, Vol. 324/5923, pp. 85-89, Elsevier, NY,

King, R.D. et al. (2004), “Functional genomic hypothesis generation and experimentation by a robot scientist”, Nature, Vol. 427, pp. 247-252, Springer Nature,

Kitano, H. (2016), “Artificial Intelligence to Win the Nobel Prize and Beyond: Creating the Engine for Scientific Discovery”, AI Magazine, Vol. 37/1, Spring 2016, pp. 39-49, Association for the Advancement of Artificial Intelligence, Palo Alto, CA,

Klucznik, T. et al. (2018), “Efficient Syntheses of Diverse, Medicinally Relevant Targets Planned by Computer and Executed in the Laboratory”, Chem, Vol. 4/3, pp. 522-532, Elsevier, Amsterdam,

Langley, P. et al. (1987), Scientific Discovery: Computational Explorations of the Creative Process, MIT Press, Cambridge, MA.

Lindley, D.V. (1956), "On a measure of information provided by an experiment", Annals of Mathematical Statistics, Vol. 27/4, pp. 986-1005, Institute of Mathematical Statistics, Beachwood, OH.

Manning, C. and H. Schütze (1999), Foundations of Statistical Natural Language Processing, MIT Press, Cambridge, MA.

Marcus, G. (2018), “Deep Learning: A Critical Appraisal”, preprint available at, submitted 2 January 2018, accessed 13 October 2018.

Rajpaul, V. et al. (2015), “A Gaussian process framework for modelling stellar activity signals in radial velocity data”, Monthly Notices of the Royal Astronomical Society, Vol. 452/3, pp. 2269-2291, Oxford Academic Press, Oxford,

The Royal Society (2017), “After the Reboot: Computing Education in UK Schools”, The Royal Society, London,

Russell, S., and P. Norvig (2016), “Artificial Intelligence: A Modern Approach. Global Edition”, Pearson Education Limited, Harlow, England.

Schneider, G, (2017), “Automating drug discovery”, Nature Reviews Drug Discovery, Vol. 17, pp. 97-113, Springer Nature,

Segler, M.H.S., M. Preuss and M.P. Waller (2018), “Planning chemical syntheses with deep neural networks and symbolic AI”, Nature, Springer Nature,

Van Heerden, E., A. Karastergiou and S. Roberts (2016), “A Framework for Assessing the Performance of Pulsar Search Pipelines”, Monthly Notices of the Royal Astronomical Society, Vol. 467/2, pp. 1661-1677, Oxford University Press, Oxford,

Wakefield, J. (2017), “Google DeepMind's NHS deal under scrutiny”, BBC News, webpage, 17 March,

Wilczek, F. (2006), Fantastic Realities: 49 Mind Journeys and a Trip to Stockholm, World Scientific Publishing, Singapore.

Williams, K. et al. (2015), “Cheaper Faster Drug Development Validated by the Repositioning of Drugs Against Neglected Tropical Diseases”, Journal of the Royal Society Interface, Vol. 12/104, The Royal Society Publishing, London,


← 1. One of the co-authors of this chapter, Ross King, has himself had the experience of wishing to give a robot scientist – Adam – credit as a co-author of a scientific paper, but encountered legal problems, as the lead author needed to sign a declaration stating that all the authors had agreed to the submission. A counter-argument is that not giving machines credit constitutes plagiarism.

← 2.

← 3. The central role of logic is set out in leading AI textbooks, such as Russell and Norvig (2016).

End of the section – Back to iLibrary publication page