How do you distinguish between interpretability, explainability and transparency of AI systems ? And what do these terms have to do with trust?

Photo by Bernard Hermant on Unsplash

AI systems have become quickly ubiquitous in the last few years and are increasingly affecting our everyday lives. Using search engines, following movie recommendations or taking pictures are amended by powerful new algorithms. Those algorithms are also deciding about my credit worthiness, my chances to get health care or even whether I may have to go to prison or not. While these algorithms are increasingly impacting the way we interact with the world and how we are treated as citizen, we may want to raise the question of whether we should trust those systems.

In the past, scientific inventions went through several phases in order to arrive at trustworthy products. Scientists like Marie Curie or Wilhelm Röntgen, for example, experimented with radioactive elements and radiation in general and made break-through discoveries. Early airplane pioneers like the Wright Brothers developed the first flying machine based on work by engineers like Otto Lilienthal who researched the underlying principles of aerodynamics that helped us understand how to develop flying machines. The safe usage in commercial applications (as in X-rays or commercial airplanes) often needed decade-long experimentation and the introductions of regulations before people trusted products. AI systems seem to have mostly skipped this important development step but more and more people asking questions about AI systems and implicit bias, privacy or safety concerns.

Luckily, there has been a lively discussion going on for the last few years of how to make AI systems more trustworthy and the answer is often that they have to be more transparent. What does transparency mean in the context of an AI system though? Do we want to have every line of code being open-sourced? Can non-experts actually understand what algorithms like convolutional neural nets (CNNs) and transformers do? Can machine learning experts explain what the predominant neural network techniques do and how those systems come to a decision?

In this article, I want to explore three terms that will help consumers, AI developers and AI system providers to instill trust in what AI systems do. There are three different terms I want to provide a clear definition of: (1) interpretability, (2) explainability, and (3) transparency. They are all connected via another important concept: trust.


An AI engineer creates ML models based on training data when she is using a supervised machine learning approach. In order to improve the model, feature engineering techniques have been developed to select the most predictive features. Since the advent of deep learning, this step has often been neglected since the hidden layers in a network can often be seen as shouldering the work of the feature engineering task. The different layers of a convolutional neural network, for example, may identify important features for a vision recognition task by learning how to recognize parts of a picture such as edges, shapes or even segments (as in parts of a human face).

Different layers of a neural network specialize in different features (e.g., edge detection, facial features)
Different layers focus on different features (from:

The engineer of the ML model, however, may not inspect those layers and simply run the network without being able to interpret what the network has actually learned. This may lead to unforeseen consequences, as the work by Sameer Singh and his colleagues has shown. They investigated classifiers that seem to provide high accuracy for a given tasks (e.g., text classification, vision recognition), but the classifier may have learned the decision boundary not for the correct reason. This can happen if a specific feature is coincidentally highly correlated with the class the classifier has to learn. Such a model would make the correct prediction but wouldn’t be able to generalize well.

A text classification system of posts on different popular topics, for example, may contain the name of the person who wrote the post. If there were a very prolific writer who always writes about the same subject, the name may turn out to be a good predictor for that specific class. However, the model would fail as soon as the writer choses a different topic. Because of this pitfall of using models as black boxes, Singh et al. invented a system called Local Interpretable Model-agnostic Explanations (LIME) that allows a ML model developer to take a closer look at the features used for the classification of a concrete example.

LIME is a model-agnostic tool for interpreting decisions a classifier makes for a given example. The tool allows developers to inspect why a text, for instance, was labelled a certain way by approximating the black box model locally. It turns off certain parts of the instance (e.g., graying out segments of a picture or omitting words). LIME then creates a simple regression model based on those so-called perturbed instances. A linear model is more easily interpretable because you can quickly see which variables have a high weight.

Predicting the correct topic for the wrong reasons (source:

The given text classification example from the 20 newsgroups data set shows how the word posting has a high impact for classifying this post correctly. However, the token posting appears in email header and has actually nothing to do with the topic discussed in the posting and the classifier suddenly becomes less trustworthy.

Singh et al. also carried out some experiments with subjects that would rate whether they would trust the results of a classifier. They created two classifiers for the 20 newsgroup task: One that used all tokens including misleading ones such posting and another one with a cleaned set of tokens making sure the classifier generalizes better. Using Mechanical Turk they asked non-ML experts which classifier they would trust more given the interpretations generated by LIME. They found that subjects were able to distinguish between the classifiers based on the descriptions LIME produced.

LIME creates understandable descriptions if the feature names are descriptive and can be presented to a user in a form that is easy to consume by a user. The authors use the term explanation because picture fragments or words are understandable even for a non-expert of machine learning. However, if the features require linguistic expert knowledge (c-command of a parse tree) or specific domain knowledge, it will be more difficult to present the output of LIME as an explanation to a user of an AI system.

Because the output of LIME is closely tied to the features used by the machine learning algorithm, LIME’s descriptions are useful to an ML engineer in order to better interpret the inner workings of the model she is developing (for a fairly comprehensive overview of similar models, see DipanjanSarkar’s article). Users of an AI system may also benefit from these descriptions, but it still lacks the explanation of why a certain label was chosen by the model. There is no natural language description provided by LIME or similar systems justifying the choice of the labels.


Admittedly, approaches to explainable AI overlap often with approaches that simply interpret the way ML models work and approaches such as decision trees are often used as a standard example for explainable AI. My definition of explainability is more strict and may go beyond what current approaches can offer, but I believe explainability needs to be grounded in natural language descriptions, as a human would provide it when given an automatically generated summary, a classification result or an answer to a question by an AI system.

Various research projects are addressing the question of how the user can be provided with better explanations of an AI system. Under the umbrella of a DARPA funded project called Explainable Artificial Intelligence (XAI), one can find several initiatives that investigate how better explanations can be produced to answer questions users may have when they look at the output of an ML algorithm. Those answers will be often mission-critical for applications in military and security applications but also for systems that supervise transportation, support medical applications or legal research.

A recent literature review of approaches to explainability by Gilpin et al. (2018) summarizes previous approaches along three categories (1) Processing, (2) Representation, and (3) Explanation producing. the categories Processing and Representation comprises proxy methods such as LIME but also Decision Trees and the analysis of the role the layers of neurons play in a neural network similar to what I call interpretability. The third category contains work that contains scripted conversation or attention-based approaches to produce an explanation automatically.

Current approaches to explanation generation are often focused on multi-modal system as in generation explanations and identifying areas in the picture for activity recognition tasks (ACT-X) and for visual question answering tasks (VQA-X). Huk Park et al. (2018), for example, developed methods to generate verbal justification and the highlighting of areas in pictures used for vision recognition tasks.

Huk Park et al. (2018) developed a Pointing and Justification model that provides multi-modal explanations

The generative narrative is a first step towards explaining model output to users who are not necessarily ML experts. Users of AI systems should be able to ask questions to the system in order to truly interact with the output. In particular, asking why questions and counterfactuals are important for making AI system more transparent to users.


AI system providers are mostly responsible for making the AI system more transparent. Transparency is another way to increase trust into AI system because it allows the user to “look under the hood”. It does, however, not mean that every system needs to be open-sourced or the training data has to be made public. Even though there is a strong push in the academic community for doing this, companies that want to sell products based on AI systems cannot do that.

Nevertheless, companies can show more transparency by identifying what data sources have been used for the development of the AI system, by incorporating confidence intervals shown with the results (i.e., I’m 97% sure this picture shows an apple). When they provide services such as image recognition or text classification, they should also identify the boundaries of the capabilities in terms of how they perform with data from minority groups or what the expected input format is.

Image taken from Excavating AI by Kate Crawford and Trevor Paglen

Work by Mitchell et al. points to including model cards with the delivery of an AI-based system. A model card would document a model with respect to a list of important questions users, software developers, policy makers as well as impacted individuals may have. The list includes some of the following questions:

  • Can you provide basic information about the model?(Who developed the model? What algorithms were used? Which version is currently used? When was it trained?)
  • What is the intended use? (What are the primary intended uses and users? What usages are out-of-scope?)
  • What metrics are being used? (What measures of model performance are being reported, and why? If decision thresholds are used, what are they, and why?)
  • What are ethical implications of the model? (e.g., Does the model use any sensitive data (e.g., protected classes)? What risks may be present in model usage?)

Other industries go through a lot of effort in following regulations that are often enforced via operation and safety audits. The airline industry, for example, is heavily regulated in order to ensure that plane are safe to operate. The development of new drugs is guided by several stages of clinical trials. Experiments on humans in Psychology require approval from an institutional review board making sure the participants are not hurt in any way.

Proactively describing the data sets, methods and limits of an AI system are essential for instilling trust into such systems currently being developed by technology companies. More specifically, explanations can help users to better understand why a specific decision was made by a machine learning model and methods to increase the interpretability of models help model developers to produce better models.

Trust me!

The development of AI systems has reached a critical moment in history similar to when alchemy became chemistry or first flying machine turned into commercially used air plane. Similar to the period system and mathematical methods describing aerodynamics, AI is quickly developing mathematical methods to drive the further development of more and more advanced systems. Enforcing interpretability, explainability and transparency in AI system will increase the overall trust into AI systems similar to how safety equipment and regulations led to safer chemical products and airplanes.

Why should I trust an AI system? was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.