How can artificial intelligence help scientists? A (non-exhaustive) overview

A. Ghosh
Lawrence Berkeley National Laboratory
United States

The diversity and ingenuity of ways in which artificial intelligence (AI) is helping scientists are sometimes shocking, even to domain experts. AI has already left a mark on every stage of the scientific process: from hypothesis generation and mathematical proof building to experiment design and monitoring, data collection, simulation and rapid inference, among others. Some intriguing cases coming up include AI helping to find new scientific insight from old scientific literature, simulate different teaching methodologies for education, write clearer scientific papers and even help with research on AI itself. This essay discusses the role that AI can play in science, with an eye on potential impacts in the near future. It touches upon some key challenges to overcome for AI to be more widely adopted in science, such as causal inference and the treatment of uncertainties.

Scientists are a strange breed of professionals, one actually encouraged by the prospect of AI “taking away their jobs”. The search for knowledge never ends: for every question that AI helps answer, scientists grow curious about many more. Once a discovery is made, one may seek a more fundamental understanding of why the finding is what it is. One may also want to know how to use this newfound knowledge to help humanity.

Researchers are intrigued by connections between seemingly unrelated disciplines of science. As science has become extremely specialised over the past century, research on how to use AI in science has formed a natural oasis for knowledge sharing and cross-disciplinary work. AI tools developed to create super-resolution images of celebrities, for example, actually find applications in materials science. Meanwhile, innovative applications in automated drug discovery have found parallels in AI for theoretical physics.

The transfer of technology between fields has never been quicker. This is because seemingly unrelated problems in different domains appear to have a unifying theme through the lens of AI applications (e.g. clustering of data, anomaly detection, visualisation and experiment design, regardless of scientific domain, have common characteristics). To date, AI has had a wide range of applications in different stages of the scientific process (Figure 1).

The most typical uses of AI in science over the past decade have involved supervised learning, where a model is “trained” (optimised with an automatic algorithm) on data already annotated with the right answers. The data may have been painstakingly annotated by humans or come pre-annotated from simulations. AI might classify objects into some predefined set of categories like identifying Higgs bosons from the vast amount of particle collision data collected at the Large Hadron Collider (LHC). It could also regress some property of an object to, for example, predict the energy of a particle recorded in a detector from its image.

Once it learns the patterns from annotated data, AI can make predictions about new data where the correct answers are not already known. For instance, having learnt about what different household waste products look like from annotated images, AI could then assign the correct trash bin (recyclable vs. non-recyclable) for a new product that is not human annotated.

In “anomaly detection”, AI aims to identify novel objects that look different from what the AI model is used to seeing. For example, it is difficult to have an exhaustive, annotated list of brain scan images spanning all the possible categories of abnormality. However, anomaly detection models need only see examples of healthy brains in their training to subsequently flag abnormalities in images of new patients. Such models do not require annotated training data.

As science demands interpretability, an opaque AI model that gives the right answers without any further explanation has limited use. For instance, anomaly detection models can highlight regions in medical images that are a cause for concern. This, in turn, points medical practitioners to regions for further investigation.

In fundamental physics, there is value in finding the simplest description of a phenomenon, often in the form of a concise, easy-to-understand formula. On the other hand, the power of deep learning comes from the ability to build enormous statistical models, often comprising millions of parameters. These are inherently difficult to interpret (see also the essay in this book by Hugh Cartwright on interpretability).

In certain cases, physicists have found a way to use the power of deep learning, while retaining interpretability. In one instance, they do this with the help of graph neural networks. These can be designed so that individual components of the model describe specific physical attributes, such as the interaction between two celestial bodies.

Once the network has performed the more challenging task of learning these relationships directly from data, symbolic regression can be used to distil the information learnt by the network into an easy-to-interpret formula. This is a less powerful technique than deep learning, but it can automatically find simple formulas to describe data. Symbolic regression has been used recently to describe the concentration of dark matter from the mass distribution of nearby cosmic structures (Cranmer et al., 2020) with the help of an easy-to-understand formula.

In the mathematical sciences, the need for interpretability might be greater still. Mathematicians would like to be able to say: “AI, please write the entire proof of this theorem, and remember to show every step of your work!” How would that work? High school students of calculus are well aware of how useful a single hint can be to arrive at a solution. Mastering all the integration tactics in the syllabus is not enough. There are too many tactics to try for a given problem.

The key to acing a calculus exam is to develop an intuition for what tactic might work in what kind of situation. This intuition develops through practice, or, in the case of AI, through training. Researchers have developed AI that can hint at the tactic most likely to work in each situation. This approach has been used to automate the formalisation of mathematical proofs. The AI suggests a tactic, a classical theorem prover implements it and together they get the job done.

An exciting form of AI in this field is reinforcement learning. This has gained much publicity recently for mastering the rules of chess, the game of Go and popular computer games, and then beating the best human players. Reinforced learning is excellent at learning a long sequence of actions to reach a desired goal.

In knot theory, for example, many open questions revolve around whether two knots can be considered equivalent, and whether one might be transformed into the other using a specific sequence of actions. If “yes”, reinforcement learning can often find the exact path from the first knot to the second, providing a clear proof of equivalence (Gukov et al., 2020). While these sorts of fully verifiable solutions are interesting, they are usually restricted to the mathematical sciences.

When it comes to trusting the products of science, whether measurements based on recorded data or complex simulations that make simplifying assumptions, scientists care a lot about uncertainties. For a while, many scientists refrained from using AI because it was difficult to quantify the uncertainty in the results. Recently, however, the tide has turned as scientists have found that AI can help more accurately quantify uncertainties.

AI can keep track of multiple uncertainties that accumulate through long scientific pipelines, while traditional methods could only keep track of certain summary information about the uncertainties. AI can even help reduce such uncertainties, allowing scientists to make more confident measurements. For instance, particle physicists have developed uncertainty-aware networks. These AI models are explicitly shown potential biases in data measurements when they are being trained. In this way, the model can automatically find the best way of handling every potential bias (Ghosh, Nachman and Whiteson, 2021).

The same technique has allowed astrophysicists to track uncertainties from high-dimensional telescope images (raw images at very high resolution, which usually require summarisation to apply traditional statistical techniques). The uncertainties can be tracked all the way to the final step of statistical inference, for instance, to deduce the nature of matter inside neutron stars from x-ray telescope images (Farrell et al., 2022).

This process makes it possible to have a comprehensive final measurement without leaving out vital information at intermediate steps. Such end-to-end models have grown in popularity. The quantification of a model’s own uncertainty has an added benefit; it can then be used to acquire data more efficiently. Consider medicine, where a vast amount of data is often available but with only a small fraction of it labelled (because labelling data requires a lot of human labour). AI can help figure out which samples are the most important for humans to annotate.

Active learning models can iteratively ask humans to annotate data points in such a way as to reduce their overall uncertainty about the data. For instance, having annotations for one image may allow the AI to learn a general pattern among similar images. In this case, asking for annotations for the first image is valuable, but subsequent similar images do not need annotation for the model to make accurate predictions about them.

An AI system that is more uncertain about some type of data indicates that there is presently less recorded knowledge about such data. Investing human time to label these uncertain data, then, will add more to recorded knowledge than spending the same resources labelling data for which the AI’s uncertainties are already small. In one instance of drug discovery, a similar approach helped cut down the number of required experiments from the 20% of possible experiments needed for a traditional algorithm to 2.5% (Kangas, Naik and Murphy, 2014).

The scientific process has many stages – from hypothesis generation, experiment design, monitoring and simulation all the way to publication. Until now, this essay has only discussed the use of AI in providing final results. However, AI is expected to contribute to every stage of science.

In drug discovery, for example, when there are too many possible chemical combinations to try, AI can narrow them down to the most promising options. In theoretical physics, if the researcher has a hunch that two kinds of mathematical tool might have some underlying equivalence, AI can help determine a correlation. This, in turn, encourages the mathematician to invest time to discover a rigorous mathematical connection. Another key component in science is simulation, and deep learning has had an enormous impact here.

Unstructured data (e.g. satellite images, global weather data) have traditionally been a challenge because dedicated algorithms need to be developed to handle them. Deep learning has been sensationally effective in handling such data to solve unusual tasks. It has made its way into popular culture through a variety of applications. One model, for example, uses a person’s image and shows how that person may look in 30 years. As another example, GitHub Copilot writes entire blocks of code for a software developer based only on a description in plain English of what the code needs to do.

Models that can create new data in this manner are called “generative”. In science, such generative networks are used to simulate physical systems. Sometimes they can improve over the state-of-the-art traditional simulation algorithms in terms of accuracy. More often they are useful because they consume orders of magnitude fewer computing resources. In this way, they relieve scientists from the burden of creating specialised simulation algorithms for each physical process.

Generative AI models with similar structure can learn to simulate the evolution of the universe, certain biological processes and so on, making them a general-purpose tool. In another use case, generative models can remove noise or unwanted objects from data, for example, to un-blend images of galaxies.

A particularly exciting feature of such models is their ability to provide “super-resolution” data, that is, data with higher resolution than in the original recorded data. In materials science, for example, super-resolution models can correctly enhance cheaper, low-resolution electron microscopic images into high-resolution images that would otherwise have been more expensive to capture (Qian et al., 2020). The trick is to have the system view small areas in high resolution and compare those to the same areas in low resolution, and learn the differences. The system can then convert all areas in the low-resolution image – the entire field of view – into a high-resolution image. In the biological sciences, aside from saving money, this approach can also protect some of the objects of research. For instance, flashing high intensity light can help to image cellular structures but can also damage the specimen. Super-resolution techniques can circumvent this problem.

A curious reader may wonder at the power of these AI simulation models. Given some initial conditions, can one always train some model to simulate some system far into the future? Classical mechanics shows this should become increasingly difficult for chaotic systems. While AI does not magically circumvent this fundamental limitation, it can improve on previous best practice. This is what makes the use of AI in climate simulation (or simulations of any other chaotic system) fascinating. Naive applications of generative AI models may not succeed in accurately predicting weather patterns over a long period. However, Pathak et al. (2020) have shown that a hybrid simulation engine that combines the power of AI with fundamental physics computations can indeed predict such patterns.

In principle, the physics equations can be computed using a traditional algorithm (an algorithm handcrafted by scientists rather than learnt automatically by AI) to make accurate predictions. However, it is prohibitively expensive in terms of the computing resources required to run it at high resolution. The low-resolution version of the algorithm is cheaper but inaccurate. However, by enhancing the predictions of the cheaper algorithm with AI, these researchers achieved accurate simulations of weather over a long period. This combination is much cheaper to run than the high-resolution traditional algorithm. The key to this technique is to recursively make predictions across small time periods using the cheap solver, enhance the prediction with the AI model and then repeat for the next time step.

There is a general trend in incorporating domain knowledge into AI systems to help push the boundaries of what was once thought possible. In the above climate example, knowledge about the specialised scientific field was expressed in the physics equations used for the handcrafted traditional algorithm. The world will see many more innovations in weather and climate modelling with AI over the coming years, especially given the growing impact of climate change (such as increasingly erratic and extreme weather patterns).

AI can also be used for data compression, finding ways to summarise the same information using fewer attributes. Consider a dataset full of 256x256 pixel images of circles. Instead of storing the value of every pixel, one could store only the location of the circle and its radius, and still retain all the relevant information. This can make data storage and transmission memory efficient.

Recently, AI has also been used to compress multi-dimensional data into two dimensions (data summarised in two attributes) so it can be visualised on a screen or paper. The compressed representation of the data can itself reveal underlying but otherwise hard-to-detect patterns in the data. For instance, the compressed representation might show that certain data points clump into distinct clusters, which usually indicates some unifying characteristics in each cluster. If scientists can identify a characteristic that unites a cluster, they may also notice new data points in the cluster for which that characteristic has not yet been discovered. This could help scientists find items – from chemicals to materials to mathematical groups – with the desired characteristic. In one example, this line of study has created a growing interest in theoretical particle physics in how it could help find new theories that could describe the universe.

Compression also helps by producing resource-efficient algorithms. AI can be self-optimised in a way to find smaller models that can more easily be deployed on fast hardware for speed-critical applications such as at the LHC.

The advent of deep learning has also benefited science in indirect ways. It spurred development of software that automatically performs differential calculus, known as automatic differentiation (AD) software. It also contributed to the need for advanced parallel processing hardware like Graphics Processor Units (GPUs) and more efficient data storage technology. These developments have allowed scientists to replace older optimisation algorithms with AD, optimise complicated traditional algorithms and leverage the power of parallel programming. Increasingly ambitious efforts are also emerging to use the new optimisation algorithms for elaborate experiment design. Relying on open-source AD software reduces the burden on scientists to maintain their own software or to upgrade them to run on modern hardware such as GPUs.

Beyond the main stages of research, AI is also more broadly useful to science. In terms of communication, some AI models have been developed to summarise research papers (see also the essays in this book by Dunietz, and by Byun and Stuhlmüller), and a few popular Twitter bots regularly tweet these automated summaries. Certain AI models highlight aspects of a draft research paper that make it either easier or less easy to comprehend (Huang, 2018). For example, the model favours articles that contain conceptual diagrams early on, presumably to help guide the reader.

Recently, an AI-based method has been proposed to present experimental measurements in physics to theoretical physicists more effectively (Arratia et al., 2022). Using data from large experiments at CERN effectively often requires a team of physicists familiar with the detector. To combine results from multiple large experiments requires a specialised team, including physicists from each experiment. This is not feasible each time a new theory needs to be tested against data. Consequently, large experimental physics collaborations try to present their results in a way for a theorist to re-use easily. Traditionally, this has involved summarisation of the results at the expense of leaving out details. The newly proposed AI method to present experimental results allows theorists unfamiliar with the experiment to access the detailed measurements more easily to explore, combine and re-use measurements from multiple large experimental collaborations, such as ATLAS, CMS, LHCb (at CERN), Belle II (in Japan), and even cosmological observations. This would enhance the impact of each measurement by making the information more accessible to the larger scientific community.

These are examples of AI helping to better disseminate scientific results, even to subject-matter experts. In the future, AI-powered virtual or augmented reality is expected to help visualise and explore scientific concepts, from the structure of DNA to particle collisions at the LHC.

Although AI today is mostly talked about in the context of digitised data, the use of AI-enhanced laboratory robots is growing (see the essay in this book by King, Peter and Courtney). Laboratory robotics can help automate precise repetitive tasks such as handling test tubes and cell cultures, among others, and avoid human exposure to harmful chemicals or radiation. Moreover, as other chapters in this book show, increasingly intelligent laboratory robots will have growing roles in experiment design and analysis. Curiosity, the Mars rover, is endeared to many. Future space and ocean exploration will see AI-powered robots in a multitude of applications.

The discussion in this essay has so far been optimistic. However, it would be an oversight to skip over the weaknesses of AI-powered research tools and the potential dangers of indiscriminate adoption.

Data-driven AI models sometimes malfunction in different ways than do traditional algorithms. Using deep learning, a robot trained to work with red, blue and green bottles in a laboratory, for example, may not generalise correctly to black bottles. Validating the behaviour of the AI model under different circumstances therefore needs to be rigorous. There is ongoing work on developing AI that can be fully validated and where the maximum risk of failure can be quantified. However, significant innovation is needed before such models become useful for real-world tasks.

Deep-learning models pick up subtle patterns in training data, including any biases in simulations. This is similar to how a model trained on some types of historical human data can learn social biases (such as sexism and discrimination against minorities). An often-discussed solution to this problem is to force a model’s predictions to be de-correlated from protected features (e.g. race, gender, age). This means the AI would on average have a similar response regardless of an individual’s race, gender and age. However, such attempts at de-correlation can actually lead to further unintended harm, especially when it is not easy to list all the sources of potential bias.

It is sometimes easier to demonstrate the unintended consequences of such bias mitigation techniques not on human data but on well-understood and fully controlled scientific data. For instance, in the context of particle physics, Ghosh and Nachman (2022) show that decorrelation techniques sometimes hide biases instead of getting rid of them. In some cases, the true bias is difficult or impossible to measure so physicists use proxy metrics to estimate it. For instance, when the exact mathematical computation of some theory cannot be done, they may use the best known approximation technique. To estimate the potential bias of this technique, they also compute approximations using a series of alternate techniques and treat the difference in the results as an estimate of the uncertainty.

In this physics study performed by Ghosh and Nachman (2022), the true bias in the model was found to be even greater after applying a debiasing solution that minimises the proxy metrics for bias. This led to vastly underestimated uncertainties on the final measurement. It is therefore advisable to consider the possibility of such unintended consequences before attempting to de-correlate away biases in AI models.

More generally, it should be considered carefully whether using the same metric to evaluate the performance of a model makes sense if it was already used for optimisation of that model. Further, sometimes a policy solution may be needed rather than a technological solution. For example, in theory, AI models could try to predict which students are more likely to succeed in science, technology, engineering and mathematics research careers. However, data are likely to be plagued by existing biases in society. A more effective solution to improving success may lie in better policies in terms of access to resource material, mentorship and creating inclusive work environments.

AI models simply learn correlations in data, not the causal relationships involved. Causal models are needed to disentangle correlation from causation. For example, if a study indicates that levels of vitamin D in a population correlate with depression, does that mean one caused the other, or are they both simply symptoms of an (as yet unknown) underlying problem?

An interesting line of research in cognitive science focuses on human-AI interactions, which illustrates one way that AI can help shed light on causation. Researchers realised they can generate situations using AI that are difficult to create in real life, and then study their impact in the real world. For example, children in the United Kingdom interacted with an AI-driven virtual teacher who spoke first in a working-class British accent and then in the different accent of the real teacher. This allowed the researchers to study the impact of a teacher’s accent on learning in children from diverse backgrounds. The ability to study these alternate situations is useful in establishing causation.

There is also growing interest in interfacing probabilistic programming (algorithms that account for the probabilistic nature of certain processes in science) with scientific simulators (such as particle physics simulators) to infer causation. These programs can run through a number of scenarios that might explain some observed data. The intersection of AI and causal inference is a nascent field, which has recently become a hot topic of research. Progress in this field will help accelerate progress in science.

The trend has been to develop large AI models that require enormous computing resources to train. This can create problems for research groups with smaller budgets, particularly compared to large AI companies. Such models also leave a large carbon footprint that is harmful for the environment.

Innovation will be required to improve the resource efficiency of AI models. Besides this, governments may have to invest in computing resources that can be shared among research groups nationally. In the United States, a task force has already been set up to look into the feasibility of a National AI Research Resource (NAIRR, 2022).

The ways in which AI is accelerating science is growing rapidly. In certain cases, giant leaps in science made possible by AI have attracted public attention. The “AlphaFold” model (a deep-learning solution) made headlines, for example, by demonstrating an extraordinary ability to predict 3D-protein structures from their amino-acid sequence. Nonetheless, the potential impact of AI on science is a long way from being realised.

In this current “AI overhang”, many innovations have potential, but there has not been enough time to explore them all. The last decade has seen a flurry of proof-of-concept innovations, but in the next one it will become common to incorporate AI into large scientific workflows. In some cases, such as at the LHC, automated workflows have already been established (Simko et al., 2021). The future may see scientific workflows optimised end-to-end – from data collection to final statistical analysis – using AI. The entire scientific process in certain cases – from hypothesis generation to the communication of scientific results – may also be fully automated.

Innovations in AI for science are often easy to transfer across different scientific domains. This has led to unifying approaches that cut across scientific disciplines. In simulation-based inference, scientific inference relies on the use of precise simulators to optimise some measurement. Differentiable programming optimises scientific workflows with software that performs automatic differential calculus. These and other unifying approaches like anomaly detection and generative models have re-energised the need for interdisciplinary experts.

Typical machine-learning models are difficult to interpret, but remain useful for tasks such as hypothesis generation, experiment monitoring and precision measurements. More interpretable models are useful for mathematical proof building. Generative models assist with tasks such as simulations, removing unwanted features from data and providing super-resolution data. Uncertainty-aware and uncertainty-quantifying models are extremely useful in providing trustworthy, reliable results. Such models can also help with efficient data acquisition by prioritising data acquisition in regions of uncertainty.

There are dangers in the indiscriminate use of AI because such models break down in unexpected ways. Therefore, it is important for AI experts to be vocal proponents of AI adoption and also caution against the unintended consequences of ill-informed applications. Future innovations may enhance interpretability and allow developers to provide more algorithmic restrictions on a mode to avoid catastrophic failures. These could occur, for example, if an AI system that controls scientific machinery were to behave erratically when it encounters a situation it has never experienced in training. The specific needs of science have fuelled interesting AI innovations, some of which have already found uses outside of science. As with other technologies developed for science, it is reasonable to expect an increasing number of innovations in AI for science to eventually benefit humanity in broader ways.

The future will likely bring growing use of AI-powered robots in laboratories and other sectors, such as space and the oceans, where scientific data are gathered. Innovations in developing causal inference models will provide huge benefits for the medical and social sciences. By accelerating science, innovations in AI are expected to help find solutions to global challenges such as clean energy generation and storage, improved climate models and treatments for disease.


Arratia, M. et al. (2022), “Publishing unbinned differential cross section results”, Journal of Instrumentation, Vol. 17,

Cranmer, M. et al. (2020), “Discovering symbolic models from deep learning with inductive biases”, arXiv, arXiv:2006.11287 [cs.LG], arXiv:2006.11287v2.

Farrell, D. et al. (2022), “Deducing neutron star equation of state parameters directly from telescope spectra with uncertainty-aware machine learning”, arXiv, arXiv:2209.02817 [astro-ph.HE],

Ghosh, A. and B. Nachman (2022), “A cautionary tale of decorrelating theory uncertainties”, The European Physical Journal C, Vol. 82/46,

Ghosh, A., B. Nachman and D. Whiteson (2021), “Uncertainty-aware machine learning for high energy physics”, Physical Review D, Vol. 104/056206,

Gukov, S. et al. (2020), “Learning to unknot”, arXiv, arXiv:2010.16263 [math.GT],

Huang, J.-B. (2018), “Deep paper gestalt”, arXiv, arXiv:1812.08775 [cs.CV],

Kangas, J.D., A.W. Naik and R.F. Murphy (2014), BMC Bioinformatics, Vol. 15/143, “Efficient discovery of responses of proteins to compounds using active learning”,

NAIRR (2022), “National AI Research Resource (NAIRR) Task Force”, webpage, (accessed 23 November 2022).

Pathak, J. et al. (2020), “Using machine learning to augment coarse-grid computational fluid dynamics simulations”, arXiv, arXiv:2010.00072 [physics.comp-ph],

Qian, Y. et al. (2020), “Effective super-resolution methods for paired electron microscopic images”, IEEE Transactions on Image Processing, Vol. 29, pp. 7317-7330,

Simko, T. et al. (2021), “Scalable declarative HEP analysis workflows for containerised compute clouds”, 7 May, Frontiers in Big Data,

Metadata, Legal and Rights

This document, as well as any data and map included herein, are without prejudice to the status of or sovereignty over any territory, to the delimitation of international frontiers and boundaries and to the name of any territory, city or area. Extracts from publications may be subject to additional disclaimers, which are set out in the complete version of the publication, available at the link provided.

© OECD 2023

The use of this work, whether digital or print, is governed by the Terms and Conditions to be found at