AI in drug discovery

K.Z. Szalay

Artificial intelligence (AI) promises to de-risk the discovery process for new drugs. This essay explores how the pharmaceuticals industry has adopted a new business model to decrease risk in the early parts of drug discovery. It looks at how AI could speed up drug discovery through cost and time savings, and the role of “explainable AI” to bridge the gap between the pharma and software industries. Finally, as early discovery shifts from academia and large pharmaceutical companies to smaller start-ups and biotech spin-offs, it looks at the need for dedicated infrastructure.

AI will change drug discovery. The main challenge of bringing a new drug to market is that a lot of time and money are required before the drug’s efficacy is revealed by testing on patients. As AI is integrated into ever-more steps in drug discovery, its main impact is in selecting experiments with the best chance of success, thereby de-risking the discovery process. Even a modest increase in efficiency can result in major savings by the time a drug gets to market.

The ability of AI systems to enhance drug discovery depends on the step in question. Explainable AI could have a major impact on the steps in drug discovery where AI is not yet widely used. However, AI approaches that are explainable by design are not yet good enough. Technical advances in explainability are still needed for “black-box” forms of AI, whatever their field of application.

Adoption of AI in the pharmaceuticals industry has been surprisingly rapid. AI solutions have a reputation for providing quick but unreliable predictions. To the degree this is the case, there is a tension with the safety focus of the pharma industry. Indeed, bringing AI directly to the bedside has not worked well (Herper, 2017). As long as AI stays within the confines of the R&D process, experiments can confirm its predictions before patients are involved.

Meticulous experiments to ensure patient safety will always be needed. However, the potential impact of AI is not to eliminate the need for clinical trials. Rather, it could create a situation in which new drugs fail less often when they eventually do get to clinical trials.

Starting in the late 1990s, productivity in the drugs industry saw a major decline (see the essay by Jack Scannell in this book). This decline continued well into the 2010s.

Fortunately, new technologies – particularly CRISPR and better prediction of drug safety – are helping avert a crash in drug discovery. The cost of new drug approvals had largely stabilised by the end of the last decade (Figure 1).1 However, getting new drugs to market remains risky; failure rates are well above 60%, even for drugs that reach clinical trials (Wong, 2019).

Major pharmaceutical companies found a new business model to decrease risk in the early parts of drug discovery: in-licensing interesting compounds from smaller biotech companies. Pharmaceutical companies paid a premium for these compounds in exchange for small companies bearing the risks faced in the early phases of discovery. The large companies did what they do best: capital-intensive clinical trials and commercialisation. This trend sped up during the last decade.

Large companies have complex processes in place, and change is hard for both individuals and organisations. Consequently, it has been in the agile small biotechnology companies where an explosion in the use of AI technologies has happened. AI solutions might already add significant value to drug discovery. However, many applications have not yet reached a level of maturity where they could be adopted inside the complex, optimised (and thus necessarily more rigid) processes of large pharmaceutical companies.

How much of an impact on drug discovery should be expected from AI? Most experimentation is done in laboratories, measuring cell cultures, or drug binding using specialised assays2 (often called wet-labs in contrast to the “dry-lab” work of experimenting on a computer). Even high-throughput wet-lab experimental assays are relatively slow and expensive. Human expertise is thus always required to pick the experiments that make sense to run, based on state-of-the-art science. As no such pre-selection is necessary to run AI predictions, machine-learning systems can help come up with novel ideas no sane human drug hunter would expect to work. While the value of such novelty generation is hard to quantify, the AI-generated list of experiments can be used to run in the lab to help derive savings. This approach would be shorter and possibly more successful than one without an AI system’s guidance.

Besides finding novelty, the other major impact expected from AI is a decrease in the cost and time associated with failures at each stage of drug discovery. These reductions are quantifiable. Just decreasing the failure rate by 20% (e.g. from 30% to 24%) in each step of the discovery process would mean halving the total cost of any single project.

The data also show that, generally, decreasing the failure rate is the best option. However, in the earliest phases of drug discovery, saving costs through fewer experiments is more important than decreasing failure rates. Figure 2 shows the estimated cost savings from state-of-the-art AI guidance in all steps of drug discovery. It assumes that AI tool developers focus on applications that deliver the most impact. These savings would total slightly over a billion US dollars per new drug (Bender and Cortés-Ciriano, 2021).

Decreasing the cost of R&D could lower the price of novel drugs, which are often prohibitively expensive for patients, and/or place major burdens on public budgets. It would also become feasible to start developing drugs for smaller patient populations, making drug discovery possible for rare diseases, as well as common diseases that need personalised approaches. For example, DeSantis, Kramer and Jemal (2017) highlighted that 20% of all cancers are rare and thus not covered by drugs targeted towards the most common cancers. Shrinking the size of the minimum patient population required to develop a new drug will be arguably the largest benefit of AI-based drug discovery over time.

One AI application that made headlines recently was to better understand the structure and dynamics of potential drug targets. Deepmind’s Alphafold2 is an AI system that helps researchers understand the 3D structure of proteins. This is an immensely important problem in biology because form defines function in the world of proteins. Indeed, most drug discovery processes start with finding the right protein to target for a given disease. Good experimental data exist on the shape of a small subset of proteins in the human proteome (the entirety of proteins that make up humans). However, science has no idea what the working, folded form of most human proteins looks like.

With extensive training on experimental data, and a smart AI architecture,3 Alphafold2 developed predictions that approximated experimental results in the CASP14 (CASP14, n.d.) challenge for a set of previously unpublished protein structures (Jumper et al., 2021). This paves the way to predicting with confidence the structure of previously unknown proteins.

Just a few months after the publication of the system, databases containing the Alphafold-predicted protein structures for all human proteins were made available to scientists (Varadi et al., 2022). This advance greatly helps researchers in the drug industry to find molecules to target a protein of interest as it is now possible to have a good understanding of what the target protein actually looks like.

The next step is to find which protein to target for a given disease. There is no single best way to find good drug targets in the lab. Different experimental assays have different strengths and weaknesses, which is also the case with AI methods in drug discovery. Different machine-learning systems – depending on the data they are trained on – will excel in addressing different analytic problems. For this reason, AI companies working on target discovery are proliferating, each developing its own discovery platform. These companies use diverse data gleaned from cell microscopy, electronic medical records, genetic databases and scientific literature, among others, creating their own drug target pipelines. While some of these methods may make their way into drug discovery, no drug based on a target discovered by AI has yet received approval from the US Food and Drug Administration (FDA). The first such drugs have only just entered human clinical trials (Jayatunga, 2022). This makes target discovery an exciting, fast-moving but still nascent area of AI in drug discovery.

Having an established protein target, all eyes are on the chemists to find the right molecule to inhibit the protein of interest – and preferably not much else. While established wet-lab methods can screen hundreds of thousands to millions of small molecules in just a few days, finding good hits (molecules that selectively bind to a given protein target) is still a daunting task. This is because the number of drug-like chemicals could be in the order of 1060, a million times the number of the atoms on Earth (Bohacek, McMartin and Guida, 1996).

Indeed, finding good hits was one of the earliest frontiers for machine learning in drug discovery. Virtual screening methods involve computational discovery of molecules that might bind to a specific target of interest. “Molecular docking”, for example, tries to find matching surfaces of the candidate molecule and the target protein.

Virtual screening can search a space much larger than is possible with wet-lab screening. This ranges from 109 molecules in commercially available virtual screening platforms to 1015 or more in proprietary pharma libraries. This, in turn, represents 4 to 9 orders of magnitude of difference compared to the 106 molecules in wet-lab screening (Hoffmann and Marcus, 2019).

Virtual screening tools have become increasingly sophisticated in the past two decades (Goodsell et al., 2021), with deep-learning methods recently joining the field (Wallach, Dzamba and Heifets, 2015). While virtual screening might be the most established subfield in which AI helps in drug discovery, the field is still far from able to exploit most of the drug-like chemical space. In general, AI techniques work well in predicting how new combinations of already measured building blocks behave. However, they cannot predict the behaviour of new building blocks (in this case, the behaviour of novel chemical structures).

In drug discovery, a long iterative chemistry process follows after the first promising molecules are found. This process aims to generate a more refined molecule (called a lead) – a molecule that has better selectivity, absorption and distribution properties. This is done to arrive at a molecule that can eventually be administered in vivo.4 AI can assist the lead optimisation process as well, but the next quantitatively different stage is for scientists to start planning experiments in animals. The first two key research tasks are to ensure the compound is not toxic and that it is efficacious (improves the disease status).

Assessing the toxicity and metabolic properties of drugs has also been a mainstay of computational drug discovery. In recent decades, models have improved greatly thanks to large-scale public data generation efforts (Kleinstreuer et al., 2014), as well as general progress in AI. While surprises still happen, most pharma companies have already integrated some metabolism and toxicity modelling solutions into their main pipelines.

The final step, and unfortunately the hardest, is predicting in vivo efficacy before administering the molecule in animals, and thereafter humans. The aim is to tell, with some reasonable degree of accuracy, which patients will respond well enough to a drug using “biomarkers” of efficacy. Traditional biomarkers are measures from blood tests or microscopic findings from a biopsy, but molecular genetic tests are being used more frequently, illustrating how medicine is increasingly personalised. However, good biomarkers are hard to find.

Any veteran drug hunter will readily say that getting a good biomarker for a drug is hard, even with support from AI, even if biomarkers are crucially important for success in the clinic (Wong, 2019). Alas, finding reliable biomarkers is not a problem well suited to most AI methods. Each patient is unique, with slightly different biochemistry. In addition, each patient can be dosed only once. If they return to the clinic, whether the drug has worked or not, their tumour composition has likely changed. This essentially renders them – for training purposes – a different patient. Both considerations make it extremely hard to generate the data with which to train an AI system to find strong biomarkers without having patient data in advance for that specific drug.

One way drug discovery teams are circumventing the problem of finding good biomarkers is through drug repurposing. This involves taking a drug (either an approved drug or one that has failed but was shown to be safe) and using its trials’ data to train the AI. The goal is to identify a new biomarker that the original research team missed or deprioritised. In practice, however, AI-based repurposing approaches have not been wildly successful. How much AI would eventually be able to contribute here remains to be seen.

One difficulty of introducing new AI methods to drug discovery is a deep cultural divide. AI comes from the software world where a practice of “move fast and break things” is doable and mostly works well. On the other hand, as evident from the above discussion, safety is deeply embedded in the culture of drug discovery. Bringing any new drug to market is already extremely risky. Consequently, novel, unproven drug ideas understandably gain relatively little traction when companies need to commit years and hundreds of millions of US dollars to prove their efficacy.5

Moving towards “explainable AI” is one way to bridge the gap between the dynamics of software development and the safety needs of the drug industry. Explainable AI is a concept introduced in response to the realisation that the best-performing AI systems (such as neural networks) yield results that are generally not explainable.

To help understand explainable AI, visual recognition is a useful analogy. Visual recognition is much more complex, opaque and less conscious than it seems. People, for example, cannot define what exactly triggers the internal, instantaneous visual recognition of a cat. They might rationalise they are seeing pointy ears, prominent whiskers and so on. However, other animals meeting those criteria can easily be found. This is exactly how a non-explainable AI system (like an artificial neural network) works.

In other machine-learning architectures, such as decision trees,6 the decision process and the learnt rules are clearly understandable, even for an untrained human observer. Alas, these interpretable architectures are widely believed to offer worse prediction performance (Gunning and Aha, 2019). Explainable AI systems are not, in theory, necessarily worse than non-explainable black-box ones. However, the leading models – deep-learning systems – are not explainable. Thus, choosing an explainable AI architecture for any problem is not straightforward.

Two arguments underscore the importance of explainability in drug discovery.

First, scientists in drug discovery already make use of statistical rules of thumb like Lipinski’s Rule of Five (a simple set of four rules for what a candidate drug molecule with good bioavailability should look like). The community accepts those rules are not 100% accurate. However, every individual rule makes scientific sense (the molecule should not be too big or too charged, for example).

Second, discovery of a drug is not finished until the molecule gets regulatory approval. Unintended side effects or undesired metabolic properties of the candidate molecule often surface, requiring constant tweaking of either the molecule structure or the target patient population. That is not possible unless the discovery team has a good understanding of why the molecule works the way it does, where it binds and the biological mechanism that the drug should inhibit.

It is possible to get some of the “why” from black-box AI models by using additional external interpretation algorithms. However, a simple ranked list of, for example, indicated targets (e.g. 1. colon carcinoma, 2. non-small-cell lung cancer) is not good enough for drug discovery. The AI team always needs to provide an explanation, one way or another, to ensure the smooth adoption of their results.

Explainability is also important as it enables detection in the data of “minority bias”. As most of the genomic data in published databases comes from Caucasians, learning algorithms often have a hard time picking out disease patterns specific for some other ethnic groups. For instance, there are slight but significant differences in the way African Americans metabolise certain drugs, requiring a different dosing schedule.

Recognising a pressing need, the FDA is already working to develop a good regulatory framework for AI in medicine (FDA, 2021). Specifically mentioned in the action plan are a) transparency to users; b) recognition and minimisation of minority bias in data; and c) honouring what is known as “good machine-learning practice”. Choosing an AI that is interpretable by design would most likely result in a shorter time to compliance than it would if using a black-box model.

As previously noted, early discovery is shifting from academia and large pharmaceutical companies to smaller start-ups and biotech spin-offs. One reason behind that shift might be the infrastructure needs of modern AI. The much-publicised breakthrough models like Alphafold2 and GPT-3 contain billions of parameters and are trained for weeks on hundreds of specialised processors (Jumper, 2021). In such huge AI systems, every training run costs tens of thousands of US dollars, and one has to run training sessions continuously to keep improving the models. This puts a large financial burden on smaller academic groups.

Another challenge for large modern AI set-ups is moving all the pieces of data and the code together at such large scales. AI companies have a dedicated team of engineers building the necessary scaffolding (data processing pipelines, orchestrating compute resources, database partitioning, etc.). In this way, every piece of code and data is in the right place at the right time on all the dozens of machines training the AI. This requires expertise and human resources that only make sense to gather if AI is a main focus of a business. Otherwise, this would be a major paradigm shift in previously biology-intensive firms.

To address these challenges, academic groups would need a stronger AI backbone like, for example, the National Artificial Intelligence Research Resource Task Force in the United States (NAIRR Task Force, 2022). Similar consortia such as the European Open Science Cloud (n.d.) or ELIXIR (n.d.) have been established recently in the European Union to further collaboration in the field. However, they are mostly focused on sharing data and tools rather than solving the problem of scaling AI in academia.

AI in drug discovery is not a new phenomenon. Machine learning has been an integral part of generating small molecule targets for decades. Recent and ongoing improvements in AI have allowed it to enter other parts of the drug discovery process, to cut costs and improve efficiency. Besides molecular docking and toxicity prediction, which were already staples of state-of-the-art drug discovery workflows, small biotech companies are piloting many new ways of using AI. This is accelerating the shift in the business model of big pharma. Rather than doing all the research in-house, these firms buy trial-ready compounds from external parties. While some of the needed steps are still unsolved, successful adoption of AI in the entire drug discovery pipeline could dramatically decrease drug development costs. This would enable the industry to make drugs for patient populations previously considered far too small to justify the expense.


Bender, A and I. Cortés-Ciriano (2021), “Artificial intelligence in drug discovery: What is realistic, what are illusions? Part 1: Ways to make an impact, and why we are not there yet”, Drug Discovery Today, Vol. 26/2, pp. 511-524,

Bohacek, R.S., C. McMartin C and W.C. Guida (1996), “The art and practice of structure-based drug design: A molecular modeling perspective”, Medicinal Research Reviews, Vol. 16/1, pp. 3-50,;2-6.

CASP14 (n.d.), 14th Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction website, (accessed 12 January 2023).

CBO (2021), Research and Development in the Pharmaceutical Industry, 8 April, Congressional Budget Office, Washington, DC,

DeSantis, C.E., J.L. Kramer and A. Jemal (2017), “The burden of rare cancers in the United States”, CA: A Cancer Journal for Clinicians, Vol. 67/4, pp. p. 261-272,

EC (n.d.), “European Open Science Cloud”, webpage, (accessed 12 January 2023).

ELIXIR (n.d.), ELIXIR website, (accessed 12 January 2023).

FDA (2021), Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) Action Plan, US Food and Drug Administration, Washington, DC,

Goodsell, D.S. et al. (2021), “The AutoDock suite at 30”, Protein Science, Vol. 30/1, pp. 31-43,

Gunning, D. and D.W. Aha, (2019), “DARPA’s explainable artificial intelligence (XAI) program”, AI Magazine, Vol. 40/2, pp. 44-58,

Herper, M. (2017), “MD Anderson benches IBM Watson in setback for artificial intelligence in medicine”, 19 February, Forbes,

Hoffmann, T. and G. Marcus (2019), “The next level in chemical space navigation: Going far beyond enumerable compound libraries”, Drug Discovery Today, Vol. 24/5, pp. 1148-1156,

Jayatunga, K.P. et al. (2022), “AI in small-molecule drug discovery: A coming wave?” Nature Reviews Drug Discovery, Vol 21/3, pp. 175-176,

Jumper, J. et al. (2021), “Highly accurate protein structure prediction with AlphaFold”, Nature 2, Vol. 596, pp. 583-589,

Kleinstreuer N.C. et al. (2014), “Phenotypic screening of the ToxCast chemical library to classify toxic and therapeutic mechanisms”, Nature Biotechnology, Vol. 32, pp. 583-591,

Morgan, P. et al. (2018), “Impact of a five-dimensional framework on R&D productivity at AstraZeneca”, Nature Reviews Drug Discovery, Vol. 17, pp. 167-181,

NAIRR Task Force (2022), Envisioning a National Artificial Intelligence Research Resource (NAIRR): Preliminary Findings and Recommendations, National Artificial Intelligence Research Resource Task Force, Washington, DC,

Varadi, M. et al. (2022) “AlphaFold protein structure database: Massively expanding the structural coverage of protein-sequence space with high-accuracy models”, Nucleic Acids Research, Vol.7/50(D1), pp. D439-D444,

Wallach, I., M. Dzamba and A. Heifets (2015), “AtomNet: A deep convolutional neural network for

bioactivity prediction in structure-based drug discovery”, arXiv, arXiv:1510.02855v,

Wikipedia (n.d.), “Flowchart”, webpage, (accessed 10 September 2022).

Wong, C.H. et al. (2019), “Estimation of clinical trial success rates and related parameters”, Biostatistics, Vol.  20/2, pp. p. 273-286,


← 1. Methodological advancements like AstraZeneca’s 5R framework (Morgan et al., 2018), along with the FDA making changes in the approval process also played a role in increasing the chances of approval.

← 2. An assay is a test of a substance to determine its quality or ingredients.

← 3. An AI architecture is the set of trainable transformations used inside the AI that maps the input of the AI system (features of the drug and the target protein, for example) to the output (the predicted binding strength).

← 4. The exact process is not as simple and linear as described here: there are in fact many iterations between the in vitro biology and chemistry and work on animal models before a drug is ready to enter clinical trials.

← 5. The datasets used for training in drug discovery also differ significantly from most well-known AI problems like computer vision or natural language processing. In image processing, for example, labels are unconditional – a cat is a cat no matter the orientation or lighting. In drug discovery, most data are conditional – the presence of protein X is a good marker for the efficacy of drug Y, but only in some specific forms of breast cancer, for example. This makes data re-use much harder in drug discovery.

← 6. A decision tree is a flowchart-like AI architecture in which each internal node represents a “test” on an attribute (e.g. whether a coin flip comes up heads or tails), each branch represents the outcome of the test, and each leaf node represents a class label (decision taken after computing all attributes) (Wikipedia, n.d.). The paths from root to leaf represent classification rules.

Metadata, Legal and Rights

This document, as well as any data and map included herein, are without prejudice to the status of or sovereignty over any territory, to the delimitation of international frontiers and boundaries and to the name of any territory, city or area. Extracts from publications may be subject to additional disclaimers, which are set out in the complete version of the publication, available at the link provided.

© OECD 2023

The use of this work, whether digital or print, is governed by the Terms and Conditions to be found at