Explainable AI and Science

I published a post a couple of weeks ago based on a paper that I wrote with my father about AI and the philosophy of science. It was an updating of a book that he had written in the 90s about AI and Scientific method. I had helped with current advances in AI.

Humanly Comprehensible AI

In the last post I talked about the important problem of induction, but the paper also raised issues for AI. In particular the need to be able to understand what an AI system is doing. To quote the paper about the situation in the 90s:

The machine learning programs analysed in Gillies (1996) all gave as output rules which were humanly comprehensible. For example Muggleton’s GOLEM gave an explicit rule relating to protein folding which is stated in Gillies (1996), p. 53. This was typical of the period when machine learning was mainly used as a technique for learning the rules of rule-based systems. This situation led to the following analysis of human interaction with the results of machine learning [Gillies (1996) pp. 54–55):

“There are great advantages in generating rules which are humanly comprehensible, because this allows the following kind of human-machine interaction. Background knowledge supplied by the human scientist is coded into a machine learning program. This generates hitherto unknown, but humanly comprehensible, rules which apply to the domain in question. The human scientist can then examine these rules, and perhaps obtain new insights into the field.”

This seems like a strong argument. We should be worried if human scientists can no longer understand the science they are doing. Nothing in the intervening decades has disproved this argument, the trouble is that AI models often aren’t very comprehenisble.

Neural Networks and Incomprehensible AI

The big development in AI in the last decade has been the rise of Deep Neural Networks. These models have been very successful in many applications, including scientific ones. The trouble is that they can be very hard to interpret.

In some ways a neural network isn’t a complex model. Each neuron is typically mathematically very simple. The trouble is that a deep model has very many interacting neurons. This is typical of the type of system that computers are good at: lots and lots of simple calculations, but it is also something that humans are bad at, our working memory can’t keep track of what all the neurons are doing.

So we have got to the worrying situation I mentioned above: scientists are using deep models that they cannot understand. We need ways of ensuring that our AI/Machine Learning models are understandable. There is a field of research in this areas called Explainable AI or Explainable Machine Learning.

Simplified Models

One approach is to use machine learning algorithms that are guaranteed to produce the sort of simplified models that humans can understand. For example, Wang and Rudin, used an algorithm that could produce check-lists of the sort that doctors and medical professionals use in their work.

This approach does guarantee that we can understand the models, but it risks loosing the benefits of using machine learning. The models that are created are simpler and therefore less powerful than a deep neural network. They may not be able to understand the phenomena being studied.

It might be possible to limit scientific use of machine learning to only those models that are interpretable, but this leads to a rather fundamental question: do we have any reason to believe that the scientific systems we are studying will all be explainable using only humanly comprehensible rules? If not we are likely to need computer models that we do not understand. There doesn’t seem to be any reason to believe that all scientific phenomena are humanly comprehensible,. The most important benefit of using AI and machine learning is that it allows us to address phenomena that we cannot understand without their help. Simplified models prevent us doing so.

Local explanations

Another approach is to train a complex model, but to then try to create simplified explanations of their behaviour. These explanations don’t have to explain the entire model, but can be local in the sense that they explain how the model makes particular decisions, without having to explain the entire complexity of it. An example is the work of Marco Tulio Ribeiro, here is a video of him talking about his work at a workshop we organised a few years.


The benefit of this approach is that you can use very complex models, but still understand how decisions are made. This is what is needed in a lot of social and legal uses of AI, such as policing or insurance decisions, where we need to be able to scrutinise the results to ensure that people aren’t being unfairly discriminated against (see Cathy O’Neil’s excellent Weapons of Math Destruction for more about these problems).

In science we typically want to be able to understand more than individual decisions, but a number of simplified explanations might combine to explain a model well enough for scientists.

AI and Pop Science

Since writing the original paper I have thought of a nice analogy for this.

One of my favourite genres of book is popular science. For example, I have recently finished Adam Rutherford’s A Brief History of Everyone Who Ever Lived, which talks about how we can use genetics to understand the deep history of humanity.

I gave up studying biology at the age of 14 (a decision I now regret) and so it is the area of science I know least about. It would take me years of study to really understand genetics, but reading a well written pop science book like Rutherford’s helps me have a big picture understanding of the subject which I can really appreciate. It does so by presenting a simplified version of the science and leaving out a lot of the complex details.

I am writing this during the COVID-19 pandemic and we are really seeing the necessity of these simplified explanations of science as politicians and policy makers, who do not have a background in science, need to be able to make life and death decisions relating to complex issues in virology and epidemiology. The public also needs an understanding of these issues so they know what is happening and can adjust their own behaviour.

Explainable AI could be something similar, though at a much higher level. Explanations for scientists who deeply understand a lot of the science, but cannot fully comprehend what is happening in a neural network.

In fact, almost every area of science that has ever existed has involved elements that we cannot consciously understand, and are happening in a complex neural network, though in most cases these are real neurons in the human brain of the scientists.

For example, science often involves looking at images. These could be x-ray scans, photographs of bacteria in agar plates, views from telescopes or electron microscopes. Trained scientists can recognise important features, just as almost all humans can recognise faces, but that doesn’t mean they know how they do so. It is tacit knowledge, something we know how to do without being able to explain it.

Maybe we just have to accept that some of science will include these tacit black boxes, both in biological and digital neural networks, but we have to make sure that they fit together into a general scientific theory that we understand the big picture of, maybe with the help of simplified explainable models.

You can read the original paper that this post was based on here;