Interpretability: Should – and can – we understand the reasoning of machine-learning systems?

H.M. Cartwright
Oxford University
United Kingdom

Few artificial intelligence (AI) applications can do more than explain to a non-expert what they have learnt or the reasoning behind their decisions. Explanations are fundamental to understanding, but not every explanation persuades. If a doctor describes a skin lesion as possibly cancerous, her patient is likely to accept the diagnosis without asking for the doctor’s medical certificates. The same patient, though, might view with suspicion a mechanic’s estimate of several thousand US dollars for simple car repairs. If the “explainer” is non-human, an acceptable explanation may be particularly hard to obtain. This essay touches upon some challenges in the development of “explainable AI”, focusing on applications in science and medicine.

For some types of problems, AIs already surpass human abilities. In these cases, it is tempting to think elaboration is unnecessary. However, AIs can also make perplexing or faulty decisions.

Retrosynthesis is the process of computationally deconstructing a target molecule of interest, such as a drug or a catalyst, into a number of simpler molecules. Each of these molecules is readily available, or can be synthesised, from still more simple chemicals. In this way, a viable route for the manufacture of the target can be found. This is a critical task in the development of commercially valuable materials.

Synthesis planning involves a combinatorial explosion of paths to examine. Meanwhile, the identification of suitable synthetic routes still relies largely on human experience, intuition and guesswork. The value of a proposed route depends on a wide variety of factors: availability of suitable reagents and solvents; stability of reactants and intermediates in storage; availability and cost of suitable synthetic equipment; the ability to suppress unwanted competing reactions; toxicity of reagents and intermediates; the necessity to limit power consumption during synthesis; and many more.

If an AI makes a surprising choice of synthetic route, further investigation is warranted. For example, it might choose a path in which each low-temperature reaction is followed by a reaction at high temperature, thus increasing energy use. In this case, an interrogation of the AI to understand its reasoning would be helpful. Such an interrogation is difficult and its value may be hard to judge if the proposed route includes unusual synthetic steps, which, because they are rarely studied, provide less reliable data.

Even when an AI’s deductions are correct, more information can be valuable. On average, for example, children who spend the most time at school have poorer eyesight than their less industrious colleagues. Does prolonged studying cause eye damage? Or do myopic children spend more time in school? An AI might link school attendance to myopia, but correlation is neither explanation nor causation. Uncovering and explaining why something happens is more difficult than recognising that it does happen, especially for AIs.

This link between cause and effect is fundamental to understanding. However, what if that link is so complex that scientists find it impossible to understand?

Science is becoming more difficult. Most relatively straightforward scientific areas are heavily studied; what remains are more challenging topics. The equation in Figure 1 – from string theory – illustrates how complex these (often theoretical) areas can appear to be. Parts of mathematics, physics and quantum mechanics are accessible to only a small number of practitioners. As science continues to evolve, some topics may become so intellectually demanding that no one can understand them.

Once such an extreme challenge to comprehension is reached, AIs could help science to progress. To that end, it could scan scientific databases, looking for previously undiscovered relations that could be cast as new laws. Such laws might be of considerable value in science. However, what would happen if the AIs could not explain them, or even provide a mathematical representation of them? It would become impossible for scientists to independently verify conclusions of the AI. Even more seriously, the human development of science would be inhibited as the field began to fill with laws that no one could understand.

Explanations are therefore essential. Tools, such as decision trees or reverse engineering, offer some insight into AI logic. However, most scale poorly with software complexity and are of value only to experts. This essay focuses on the needs of the non-expert, for whom explanations should be appropriate in extent and complexity to avoid the need for any further detail. This requirement may be demanding. The power of AI derives from its ability to work in high-dimensionality space. Translating into human-digestible form what has been learnt in such a space may yield dense lines of reasoning, even if individual parts of the argument are clear.

Explanations must also be accurate. Though an obvious requirement, the implications of this are far-reaching. It is not sufficient that an AI generate reliable decisions; its explanatory model to provide a commentary must be equally effective, as the retrosynthesis example above suggests.

Interpretability is the capacity of a black-box predictor to provide lucid explanations. Before considering some challenges, there is a fundamental question: is comprehensive explanation even possible?

The power and complexity of an AI are closely linked. Yet, even if the working of the underlying code is open to inspection, AI reasoning may be opaque. Dyson (2019) has argued that “… [a]ny system simple enough to be understandable will not be complicated enough to behave intelligently, while any system complicated enough to behave intelligently will be too complicated to understand.”

A pessimistic view perhaps, but interpreting AI logic is unlikely to be straightforward. In both the human brain and artificial neural networks, the representation of knowledge is distributed. However, their different structures make any direct translation between them impossible (at least currently). Humans communicate using symbolic language, and so require an AI to provide information in a compatible format. Even if an AI could do this, that ability on its own might be insufficient. The AI needs to provide meaningful explanations but also have the ability to engage in logical argument; this is a lot to ask.

If, in principle, the reasoning of an AI can be described, what makes extracting explanations from it tricky? Could it not just be sliced open for some rules and deductions? Unfortunately, matters are not that simple. At the heart of an AI lie not rules, but tens or hundreds of thousands of numbers, whose interpretation is difficult even for programmers.

These numbers encode the knowledge accumulated during training. An initial challenge is to ensure these training data are of suitable quality and that their context is sound. Observable (directly measurable) training data may be biased or influenced by factors of which humans are unaware. Therefore, access must be available to appropriate, high-fidelity training data.

Raw tabular data are readily entered into AIs, but metadata – background information that can place the raw data in context – may not always be available. Humans may know that relations exist between parts of the data; for instance, that average waist size can be related to country of residence. However, the AI may not have the benefit of this prior knowledge. It must discover relationships for itself, implying that access will be required to larger datasets.

AIs may be made more robust by combining deep neural networks with rule-based methods. Such a combination may be valuable in safety-critical applications (e.g. when people work alongside AI-controlled robots on a car production line). However, the inclusion of rule-based components in an AI system does not of itself lead to an improved ability to explain.

Effective AI systems typically comprise tens of thousands of lines of code, constructed by teams of programmers, none of whom knows how the entire system works. It has been reported, perhaps apocryphally, that older versions of the Windows operating system contained large quantities of apparently redundant code that software engineers dared not remove because its purpose was unknown. AI software, though more compact than a complete operating system, may be developed over several years. During that time, the behaviour of the software and any embedded explanation systems may drift from the original specifications, possibly without the knowledge of programmers.

Even when development of the software itself is complete, AIs may continue to learn in some fields while they work. Learning about causality, for example, requires agents that can interact with their environment and, in doing so, create their own data. In chemistry, AI-assisted robots can be used to assess and optimise new synthetic routes, generating a pool of knowledge about practicality, safety and yields that grows as the system runs new chemical reactions. As knowledge accumulates, the AI’s understanding may grow, but its ability to explain this progressively more detailed world may diminish.

One might expect that a typical question posed to an AI would be “Why did you conclude this?” By contrast, an “exception analysis” wants to understand why mistakes occur. The question then becomes: “Why did you get this wrong?”

An understanding of why a decision is faulty can uncover limitations of a trained AI model. However, exception analyses are rare in science for two reasons. First, a human must recognise the AI has taken a wrong turn. Second, a sufficiently competent explanatory system is available to uncover the origin of the failure.

Web-based image recognition applications have impressive capabilities. However, it is challenging to construct an image that an AI can use to illustrate an explanation. It often yields hybrid images in which multiple portions of the training data seem blended into a single confused new image.

Preparing reliable synthetic images as part of an explanation is an ongoing challenge. Temporal (time-varying) image streams, such as electroencephalography (EEG) scans, present further difficulties. Beyond providing a textual explanation, an AI attempting to interpret and explain a series of EEG images may need to illustrate the text with unambiguous images constructed from visual data that vary with time, patient and measurement conditions.

AIs operating in different areas may use software based on similar algorithms, provided that issues such as protection of intellectual property do not create barriers. With suitable retraining (usually from scratch), an AI may be repurposed into a substantially different field. However, porting an explanation system may be problematic. An explanation mechanism designed for a board game may be poorly suited to explain how a protein folds, no matter how much it is modified. Porting AIs between applications may also generate reliability issues, with potential consequences for features such as the ability to predict software failure modes.

The ethics of AI decision making (not just the ethics of AI use) is an area of increasing interest. This interest is concentrated in areas in which AI might be used to make decisions that affect people directly, particularly in medicine.

Triage is the process of assessing patients who enter the emergency ward of a hospital to determine what treatment they require. Medical staff must make crucial decisions, up to and including “Can the life of this patient be saved?” The number of patients moving through a large emergency unit is substantial. Over time, a significant database of previous triage decisions builds up, providing a resource that could be used to train an AI to make its own decisions. Such a database will include some data that relate to the patients’ medical condition, and other data that are ethical in nature. If a large group of seriously ill patients arrives together, it may overwhelm the capacity of the unit. This could lead to delayed treatment for some patients; staff must then make judgements that contain ethical elements: “Should this patient be saved?”

An AI trained on triage data might well develop the ability to make both medical decisions about the treatment of incoming patients and ethical decisions if necessary. The defining line between the two may be blurred. Therefore, unambiguous and sympathetic justification of AI decisions would be crucial to convince medical staff, relatives and patients that decisions were appropriate.

An AI making such ethical decisions has been trained on data that include numerous examples drawn from the operation of a real emergency department. Consequently, one might expect these decisions would simply replicate those of a human in comparable circumstances. However, AI tools may find unexpected solutions that humans may not have spotted. Humans might draw a clear line between medical and ethical decisions, for example. For an AI, information about the patient’s symptoms, treatment, prognosis and eventual outcome are all just data points. The AI’s assessment, based on the entirety of what it has learnt need not be in accord with human ethics.

Transparency is particularly important for ethical decisions. Privacy issues may cloud seemingly straightforward decisions. For example, should confidential data from a drug trial be released to the public before the trial is complete? If a drug trial has a small number of patients, should those patients receiving the placebo continue taking it to give the trial more statistical strength, even when data show the drug taken by other participants is effective?

Limited transparency may facilitate deception. A fair AI can create a decision tree employing a set of “if-then” rules to support reasoning that suggests a selection process is gender-neutral. Yet fair algorithms can be applied discriminately, a technique known as “fairwashing”.

In fairwashing, an unfair procedure (in the selection of candidates for promotion, perhaps) is presented in a manner that suggests it is fair. The AI is used knowingly to hide evidence that critical factors, such as gender and ethnicity, influenced the decision. An opportunity to interrogate the AI would provide some degree of protection against the practice of fairwashing.

Use of AI by non-experts may also give rise to undesirable side effects. Millions of people self-diagnose using online medical chatbots, some of which employ AI. Users may feel as if, or even believe, they are chatting with another human. If the advice is faulty or misunderstood, where does the blame lie?

Without any way to assess the expertise of the software, AI medical recommendations must be taken on trust. Yet the AI may be provided by a commercial concern, which influences its recommendations (e.g. which drugs to take). The ready availability of impersonal diagnosis over the Internet can reduce visits to human doctors. This could lead to poorer outcomes since interactions with a computer may be less illuminating than those with a human doctor.

AI is at its best when users have confidence that its decisions are reasonable and justifiable. Incomplete explanations may create suspicion that its operation is being deliberately or unintentionally obfuscated. They might also suggest the benefits of using AI accrue to the commercial or government entity rather than the user.

In science, incomplete explanations may be unavoidable when release of data is restricted by commercial interests, such as a patent application. However, science flourishes through the rapid and comprehensive dissemination of information. Therefore, deliberate or inadvertent limitation of explanations from AI-based systems can inhibit scientific progress.

The General Data Protection Regulation became law in the European Union in May 2018. It requires information about the “logic involved” in an AI to guard against discriminatory or unfair practices. While satisfying the “right to an explanation” is admirable, that aim to date is not fully achieved. Indeed, the AI’s logic could be interpreted as a reference only to its algorithmic processes. Explanations of how the software operates might interest a programmer. However, they would be of little value to most people who are looking to understand how the AI reached its decision.

Explanatory systems must be developed and enhanced to match the power of the systems within which they are contained. However, even as they are, the decision-making side of AI will keep evolving. As a report from the UK House of Lords (2018) puts it: “... if we could only make use of those mechanisms that we understand, we would reduce the benefits of artificial intelligence enormously.”

The implication is clear: the government will not pause development of AI decision making to allow explanatory systems to catch up. However, developers should not argue that what the AI system can do is so much more important than how it does it. They should not put work on explanatory systems to one side.

Finally, as AI becomes more powerful, the ability to predict future ramifications of its decisions will become more important. However, the use of AI in one field may affect activities in another apparently unrelated field. To anticipate such effects, the limitations and capabilities of AI operating systems must be open to interrogation.

As AI applications become more powerful and widespread, the demand for explanation will grow. It might seem that AI is not so different from other methods of data analysis, just more efficient. However, this underestimates both the power of the methods and the degree to which the reasoning of an AI is hidden compared to conventional data analysis tools. Halting software development until exhaustive explanations are routinely available would be disruptive, and probably impractical. However, AI users must not be encouraged to believe that providing comprehensive explanations is so difficult that its decisions should be accepted without question.

“Useful AI” (i.e. commercially valuable) risks developing at a far greater rate than that of “user-friendly AI” (i.e. that can explain itself). Moreover, if explanations of any sort are not expected, software companies may think that development of explanatory systems can be quietly put to one side. If this were to happen, the opportunity to develop them will be lost and powerful – but opaque – AIs could become the norm. Furthermore, current software largely sidesteps ethical and practical challenges in the provision of explanations. Users are often unaware they are interacting with an AI, or that it is processing their data. With no knowledge of AI involvement, the user will not be expecting an explanation.

Governments can undoubtedly play a role in helping to foster research on the explanation problem, but how best to have significant impact is unclear. Any government funding would be dwarfed by the huge amounts of money already invested by the biggest commercial players (Google, Amazon, et al.) on AI in general. Governments might pour money into national agencies like the US Defense Advanced Research Projects or other public organisations. However, it might be hard to bring together a sufficiently large group of talented people to make real progress in such a demanding area. These issues require further explanation.


Dyson, G. (2019), “The third law”, in Possible Minds: 25 Ways of Looking at AI, Brockman J., (ed.), Penguin, New York.

House of Lords (2018), “AI in the UK: Ready, willing and able?”, Select Committee on Artificial Intelligence, Report of Session 2017-19,

Nardelli, M. and F. Di Noto (2020), “On some equations concerning the Casimir effect between World-Branes in Heterotic M-Theory and the Casimir effect in spaces with nontrivial topology. Mathematical connections with some sectors of Number Theory”,

Metadata, Legal and Rights

This document, as well as any data and map included herein, are without prejudice to the status of or sovereignty over any territory, to the delimitation of international frontiers and boundaries and to the name of any territory, city or area. Extracts from publications may be subject to additional disclaimers, which are set out in the complete version of the publication, available at the link provided.

© OECD 2023

The use of this work, whether digital or print, is governed by the Terms and Conditions to be found at