Interpretability is one of the most challenging aspects of the deep learning space. Imagine understanding a neural network with hundreds of thousands of neurons distributed across thousands of hidden layers. The interconnected and complex nature of most deep neural networks makes unsuitable for traditional debugging tools. As a result, data scientists often rely on visualization techniques that help them understand how neural networks make decisions which becomes an constant challenge. To advance this area, OpenAI just unveiled Microscope and the Lucid Library which enable the visualization of neurons within a neural network.

Interpretability is a desirable property in deep neural network solutions until you need to sacrifice other aspects such as accuracy. The friction between the interpretability and accuracy capabilities of deep learning models is the friction between being able to accomplish complex knowledge tasks and understanding how those tasks were accomplished. Knowledge vs. Control, Performance vs. Accountability, Efficiency vs. Simplicity…pick your favorite dilemma and they all can be explained by balancing the tradeoffs between accuracy and interpretability. Many deep learning techniques are complex in nature and, although they result very accurate in many scenarios, they can become incredibly difficult to interpret. All deep learning models have certain degree of interpretability but the specifics of it depends on a few key building blocks.

The Building Blocks of Interpretability

When comes to deep learning models, interpretability is not a single concept but a combination of different principles. In a recent paper, researchers from Google outlined what they considered some of the foundational building blocks of interpretability. The paper presents three fundamental characteristics that make a model interpretable:

Understanding what Hidden Layers Do: The bulk of the knowledge in a deep learning model is formed in the hidden layers. Understanding the functionality of the different hidden layers at a macro level is essential to be able to interpret a deep learning model.

· Understanding How Nodes are Activated: The key to interpretability is not to understand the functionality of individual neurons in a network but rather groups of interconnected neurons that fire together in the same spatial location. Segmenting a network by groups of interconnected neurons will provide a simpler level of abstraction to understand its functionality.

· Understanding How Concepts are Formed: Understanding how deep neural network forms individual concepts that can then be assembled into the final output is another key building block of interpretability.

Borrowing Inspiration from Natural Sciences

Outlining the key building blocks of interpretability was certainly a step on the right direction but is far from being universally adopted. One of the few things that the majority of the deep learning community agrees when comes to interpretability is that we don’t even have the right definition.

In the absence of the solid consensus around interpretability, the answer might rely on diving deeper into our understanding of the decision making process in neural networks. That approached seemed to have worked for many other areas of science. For instance, at a time where there was not fundamental agreement about the structure of organisms, the invention of the microscope enabled the visualization of cells which catalyzed the cellular biology revolution.

Maybe we need a microscope for neural networks.


OpenAI Microscope is a collection of visualizations of common deep neural networks in order to facilitate their interpretability. Microscope makes it easier to analyze the features that form inside these neural networks as well as the connections between its neurons.

Let’s take the famous AlexNet neural network which was the winning entry winning entry in ILSVRC 2012. It solves the problem of image classification where the input is an image of one of 1000 different classes (e.g. cats, dogs etc.) and the output is a vector of 1000 numbers.

Using OpenAI Microscope, we can select a sample dataset and visualize the core architecture of AlexNet alongside the state of the image classification process on each layer.