AI Explainability: Why we need it, how it works, and who’s winning

Startups and incumbents are investing massive amounts of money into AI (approximately $35 billion in 2019 according to IDC). As these AI models become increasingly effective, some businesses are facing a new problem: they can’t use them. Why? Well, the majority of these AI models are considered “black boxes”, meaning that there is no way to determine how the algorithm came to its decision. While that may be okay for companies in industries like gaming or retail, it certainly is not acceptable for companies that operate in heavily regulated industries like financial services or healthcare. Fortunately, there are numerous explainability solutions popping up to help businesses interpret their model and make the metaphorical black box a little bit more transparent. In this post, I’m going to dig into why we need AI explainability, how existing explainability techniques work, why investors are excited about it, and what companies are attacking the problem.

Also, if you like this post and want future articles sent to your inbox, please subscribe to my distribution list! Alright, let’s dive in.

1. Why do we need AI explainability?

I’m going to start by walking through a specific example that demonstrates the explainability pain point and then zoom out to illustrate the full scope of the issue.

AI in Financial Services

One of the most prominent use cases for AI in the financial services industry is to enhance credit underwriting models. For those that are unfamiliar with financial jargon, a credit underwriting model is an algorithm that determines whether an applicant should receive a loan or not. To improve its existing credit model, a bank could (and many banks have) build an AI algorithm that learns from prior lending decisions. The algorithm’s purpose would be to minimize bad loans and optimize profitability.

To build this technology, the bank would have to (1) choose which type of algorithm it wants to use, (2) modify that algorithm for its specific use case, and then (3) feed the algorithm a massive amount of training data. That data would likely include prior applicants’ gender, age, work history, salary, etc. It would also include information on the outcome of each of those lending decisions. With this information, the program could start predicting which applicants are “credit worthy”. At first, the algorithm would probably do a terrible job; however, it would continue to learn from each outcome and eventually reduce the number of bad loans that the bank makes. In doing so, it would save the bank a substantial amount of money, allowing it to write new loans at more attractive rates. This would enable the bank to attract more applicants and significantly scale up its customer base.

So what’s the problem with that? As I alluded to earlier, the problem is that the bank’s algorithm is most likely a “black box”, meaning that the bank can’t explain why it approved one applicant but declined another. This is a huge issue, particularly if that applicant believes they were declined on the basis of race, gender, or age. If the applicant accused the bank of algorithmic bias, it could seriously damage the bank’s reputation and may lead to a lawsuit. Unsurprisingly, we’ve already seen this play out in real life. In one high profile example, Apple’s co-founder, Steve Wozniak, tweeted that the Apple card gave him a credit limit that was 10x higher than his wife’s. Apple was not able to explain why its algorithm made that decision and was (understandably) raked over the coals.

Zooming Out

While credit underwriting is just one example in one industry, it’s easy to imagine how that same logic would apply to other use cases. For instance, consider AI algorithms that help retail investors construct their investment portfolios, help insurance companies underwrite policies, help doctors make medical diagnoses, help HR departments screen new employees, the list goes on. To understand the breadth of this issue, I’ve put together two graphics (below). The first is a list of prominent use cases for AI in some of the largest regulated industries. The second is an abbreviated market map highlighting several startups and incumbents that are attacking those use cases.

This hopefully makes it clear that AI’s influence has continued to expand into pretty much every industry. While that presents an incredible opportunity for businesses to increase efficiency and overall customer value, the lack of explainability represents a huge roadblock in the path to widespread implementation and adoption.

One last thing. It’s important to note that, while explainability is a necessity for businesses that operate in the regulated industries listed above, it actually has even broader applicability because it can be used to de-bug, and improve trust in, AI models.

2. How does AI Explainability work?

There are two main methodologies for explaining AI models: Integrated Gradients and SHAP. Integrated Gradients is useful for differentiable models like neural networks, logistic regression, and support vector machines. SHAP is useful for non-differentiable models like boosted trees and random forests. Let’s kick it off with Integrated Gradients.

**Just a heads up, this section is a little dense so feel free to skip ahead if you don’t care about the technical explanation!**

Integrated Gradients (“IG”)

Before we get into integrated gradients, I think it might be helpful to quickly refresh on what a gradient is. A gradient is very similar to a slope but it can be used for functions with multiple dimensions. For a quick example, consider a linear function with one variable: y = 2x + 1. In this function, the slope and gradient are both equal to the co-efficient of the single variable, which is 2. That value is important because it tells us the relation between the function’s inputs and outputs.

For a multi-dimensional function like y = 2x + 13z + 2, the slope is not overly useful because it can only be calculated with respect to one variable at a time. The gradient, on the other hand, can be determined for the full equation. It is the vector of the co-efficients of each variable. In our example, that would be [2,13]. Similar to the single variable scenario, the gradient can be used to determine the impact that each variable has on the function’s output. Why is that relevant? Well, if our AI algorithm determines that there is a linear relationship between our inputs and outputs, we could simply run a multiple regression and use the gradient to interpret the importance of each variable.

Unfortunately, AI models are a bit more complicated than that and rarely come up with nice linear relationships. For example, deep learning models often have numerous layers and each layer typically has its own logic. To make things even more complicated, many of those layers are considered “hidden layers”, which means they are unintelligible to humans.

Generic S Curve for visualization purposes

The worst part is that those hidden layers often account for the majority of the deep learning model’s predictive power. Visually, you can picture a deep learning model as an S-Curve where the x-axis represents the layer number and the y-axis represents the amount of predictive value generated. What we really want to measure is the impact of each variable at the points where that variable is having the greatest impact on the model’s output. To find that, you can’t simply take the gradient at the input or the output, you need to evaluate the impact of each variable throughout the model’s entire decision-making process. Luckily, IG provides us with a sleek way to do that!

The IG methodology works as follows. First, you need to start with a “baseline” input. That baseline can be any input that is completely absent of signal (i.e. has zero feature importance). The baseline would be a black picture for image recognition models, a zero vector for text-based models, etc. Once you give IG a baseline and a final output, the program constructs a linear path between the two values and splits the path up into a number of even intervals (usually between 50 and 300). From there, IG calculates the gradient at each interval and determines the impact that each feature had at that point in the model’s decision-making process. Last, it averages those gradients to determine the overall contribution of each feature to the model’s output and calculates which features had the greatest impact.

A more technical explanation is that IG calculates a Riemann sum of the gradients to approximate the path integral. Hence…wait for it…integrated gradients. For a more detailed explanation and numerous interesting examples, I’d definitely recommend checking out this post by the creators of the integrated gradients methodology!

SHAP (aka SHapley Additive exPlanations)

Our second methodology is SHAP. Despite the somewhat strange acronym, SHAP is a very powerful explainability technique. At a high level, SHAP can be used to approximate how much each feature of a non-differentiable AI model contributes to the output (relative to a baseline).

In order to make this calculation, the program first breaks out every possible permutation of the model’s variables. As a quick reminder, a permutation is basically a combination where the order matters. For example, for a credit underwriting model that uses “Gender”, “Eye Color”, and “City” as inputs, one permutation would be a blue-eyed female that lives in New York. After breaking out the various permutations, SHAP calculates the contribution of each variable in each permutation and then takes the average of those contributions. Those average values are known as the SHAP values.

SHAP values are powerful because they (i) enable users to understand how much each variable contributed to an individual prediction and (ii) can be summed to determine how much impact each variable had on the model as a whole. Said another way, SHAP values provide both local and global interpretability.

If you want additional detail on the SHAP calculation methodology, I’d recommend checking out this article. Also, if you want to see how SHAP values can be used to explain a real life example, I’d check out this article by Dan Becker at Kaggle. I’ve included two of his graphics below as a teaser. The first shows how SHAP values can be used to interpret individual predictions (local interpretability). The model predicts whether a soccer team has the “man of the match” on their team. This graphic shows the model’s prediction for one game and visualizes how much impact each of the 12 variables had in predicting whether that soccer team had the “man of the match” relative to a baseline prediction of 0.5.