Explainable AI and interpretability methods in Financial Risk — Part 2 (Approaches to evaluate explainers)

10 min readJan 20, 2021

We understood architecture and functioning of four explainers in detail in Part 1. Now we will be deep diving into approaches to evaluate these explainers so that we can trust their explanations to gain confidence in the model. Focus will be on tabular data given the nature of Risk data.

Explainers recap
- Local Interpretable Model agnostic Explanations (LIME)
- Explainable Boosting Machine (EBM)
- Shapely additive values (Shap)
- Anchors
Data format
Evaluation metrics
a. Stability
b. Separability
c. Similarity
d. Simulatability
- Forward (Pre, Post)
- Counterfactual (Pre, Post)
- Proposed Architecture
- Inclusion of explanation weights in input data
- Embedding explanation weights
- Appending explanation weights
(i) Optimized and enhanced Forward Simulatability
(ii) Optimized and enhanced Counterfactual Simulatability

Let’s discuss in short the working of explainers which will be a good starting point

Explainers (recap)

Local Interpretable Model agnostic Explanations (LIME)
Interpret local behavior of model in linear way to get which features are dominating the prediction. This is a model agnostic approach meaning it can explain any model locally. This approach perturbs the sample data to explain, generate such perturbed data and checks if there are changes in predictions and accordingly it weights the perturbed data. At the final stage it fits a linear model to generate explanations.

Explainable Boosting Machine (EBM)

Decision trees are trained on each one of the features, collating and minimizing residuals. Thus explaining which feature contributes in what manner for the prediction. We get signed weights to the features for sample explanation. Thus given dataset we get signed feature weights which are global explanations of every data point in the dataset.

Shapely values (SHAP)

Shapely additive tool comprises of all possible combinations of feature contributions in sequential manner. It starts from base model and picks one permutation of N features then captures conditional dependence feature contribution, selects the model with highest jumps which indicate abrupt changes in feature contribution towards prediction.

Anchors

The idea behind Anchors comes from LIME but here it extends the coverage of interpretation in non linear fashion and explain the prediction in that coverage locally. Basically it interprets local feature domination and to what extent in N dimensional space these features (and the range of values taken by these features) dominate. This provides deeper explanations locally which extract more information.

Reminder!!!
All the animations shown in this blog are also available on my YouTube channel: https://www.youtube.com/c/MaharshiYadav

Data format

Explanations provided by these explainers are merely weights of features indicating how the feature contributes towards the prediction (locally or globally). Thus we have two datasets of same dimensions (MxN) one being the input data and other the (signed) explanations. For evaluating these explanations we have to come up with different metrics derived from these two datasets. We have performed experiments for explanation weights of all features.

Datasets: Input data and generation of explanation weights data (watch video)

Evaluation metrics

Approach is to evaluate explanations based on some platform to get to know how the explainer has learnt the black box model. We have our input data and explanation weights data which contains list of features (in priority sequence) are dominating locally/globally (stored in Y) and their corresponding (signed) weights (stored in X) e.g. if X[0] = [-0.3, 0.1], Y[0] = [“Debt Ratio”, “FICO score”] which means Debt Ratio is inversely contributing towards the prediction of Risk default to a large extent as debt for that person might be so much that it will be hard for it to repay consistently and FICO score is a score of trust in that person and due to which it is contributing positively towards the prediction.

Since we have these two dataframes, one being input data and another feature weights data. One thought comes in mind to compare distribution of each input feature with corresponding feature weights but that is exactly how we are NOT planning things out. We cannot compare features individually amongst samples as it contradicts the ground idea of considering N dimensional feature space and diminishes the likelihood of sample points being closer.

When we think of higher dimensional data and comparing such dataframes we need to compare distributions which means clustering is one viable option. If we think of clustering these two dataframes we expect to have similar cluster distribution as both should (as different explainer give different results) capture the essence of similar sample point. Thus I came up with 3 approaches which will be a basis for comparison.

Stability

Behavior of the model w.r.t the explainer should be similar to the data which represents similar positions in N-dimensional space. This proximity can be captured by unsupervised learning on both input and explanations data. To get some intuition we can randomly pick two input data samples belonging to same cluster and we expect them to lie in same explanations cluster. This signifies that data points closer to one another have similar explanations. Thus we get more confidence in the explanations.
This means our explanations are stable!

Separability

Financial world is driven by so many factors and every entity is projected in terms of many features like a portfolio of a loan seeker, balance sheet of companies, etc. In this versatile environment there are different kinds of events happening. When explainers try to explain such events we expect that two entirely different scenarios should be far enough in N-dimensional space and their explanations should also differ sufficiently so that they fall in different explanation clusters.
This ensures separability amongst the explanations!

Similarity

Usually we follow the Elbow method to determine the number of cluster to be formed while applying k-means to input dataset. This value usually comes in single digits which is really good in generic scenarios. In financial world, there can be lot many types of events which ideally should be clustered differently if we look into N-dimensional space at a local level but if we follow our approach we tend to consider events on a global level for clustering.

This deviates from our goal but we cannot come up to a good hyper parameter for such datasets. This is the motivation behind Similarity metric. This gives a rough idea that in one cluster if we pick two random points we expect them to be similar. This gives us a confidence of data similarity in input data clusters. Ideally all data points belonging to same cluster should be similar but this is not always the case given complexity of data. So by this metric we get some idea about the data distribution in clusters.

Similarity metric animation (watch video)

We compare samples with a simple cosine similarity method. We expect cosine similarity of points belonging to different clusters to be around 0.

For Similarity we, calculate the mean of all 100 randomly picked distinct pair of data points (stored in list) belonging to same cluster.

We also calculate the standard deviation of this list which tells us about their distribution.

Simulatability

I was exploring state of the art techniques for evaluating explanations given by our explainers which gives us a better understanding of their versatility in explaining different models. I read this paper which has an interesting approach in measuring effect on human interpretability and it’s deviation when human subjects are acknowledged with explanations. Herein human subjects were hired and trained to perform certain tasks and based on their interpretations and effect of explanations on their interpretations.

We will be discussing about the experiments performed in this paper (these were done for text data as it is easy for humans to interpret text) in short and the basis of which we come up with a new methodology to evaluate explanations without human subject involvement.

This paper has performed model explanation based on some explainers two of which are common with ours namely LIME and Anchor. Firstly we define model simulatability as property of model which ensures human subject can predict model output given new set of inputs as LIME and Anchor perform locally in the features space, it is easy for human test subjects to judge model output compared to that of EBM and Shap. We will discuss two type of tests namely Forward Simulatability and Counterfactual Simulatability. Our metric calculates explanation effect on human interpretation
metric = (POST simulatability accuracy — PRE Simulatability accuracy)

Forward Simulatability

PRE phase
Human subjects are given data from validation set, the model prediction and the data labels. They are asked to predict the model output. This is a PRE phase whose accuracy is calculated by comparing human subject output prediction with actual model output.

POST phase
Human subjects are given data, model output, data labels and explanations based on which they are asked to predict model output on the same dataset given to them in PRE phase. Thus the change in predicted outputs of human subjects is purely the effect of explanations.

Counterfactual Simulatability

The motivation for this test is to observe change in human evaluation on a perturbation of a data point.

PRE phase
Humans are trained on the training data and corresponding model output of training data. Now they are given data, model output and data labels in test data and are asked to predict model output on a perturbed data (here we follow a specific perturbation) which is basically artificial data generated which acts in opposite sense of the data provided to the user initially. Here we get to know human subject’s predictions on the data points which are artificially acting opposite to the data points which they were given initially.

POST phase
Humans are given data from test set, model output, data labels and explanations. Now they are asked to predict model output on perturbed data.

Proposed Architecture

The main idea is to ‘strategically’ train a neural network instead of human subjects on dataset and different explanation weights.

Main Advantages:
1. Removal of dependency on human interpretations
2. Applicable on highly complex black box models
3. Dataset size is never a concern

Inclusion of explanation weights in input data

For POST metric we have to come up with some plan to include explanation weights in corresponding dataset. These can be either embedded or attached in some form which enhances explanation effect and simulates a human interpretation and effects of explanations.
So we do both!

Embedding explanation weights
Represents fusion of data features and explanation feature weights at individual level as both are expected to represent similar scenarios. Neural network will train on this embedded data and capture effect of explanations. As explanations are just weights of features we simply have to individually multiply it with data.

Appending explanation weights
This just expects neural network to learn explanation effects by just a simple append. Thus the number of features will be twice for neural network to learn. This replicates a naive human who just has extra explanation data without understanding its significance to input data.

Append explanations with input data at columnar level

Since these are two different approaches, our number of experiments will also get doubled. For simplicity we will be referring to this generated training_data in POST methods.

Optimized and enhanced Forward Simulatability

Train neural network on input data by providing model output and labels
Predict model output on validation data
Calculate model accuracy => PRE_forward

PRE_Forward — Network trained on X_input and predictions on X_validation

Generate training data by inclusion of explanation weights in input dataset.
As discussed, this generated training data replicates the scenario of humans getting to understand the explanations of the data they have predicted before.

Repeat the above steps using generated training data.
we calculate model accuracy => POST_forward

POST_Forward — Network trained on X_train (includes explanations) and predictions on X_validation

Optimized and enhanced Counterfactual Simulatability

We train a neural network on input data
Generate perturbed input data
Get model’s output on the perturbed input data.
Perform model execution on the same perturbed input data
Calculate neural network model accuracy => PRE_counterfactual

PRE_Counterfactual — Network trained on X_input and predictions on perturbed X_input

Repeat the above steps but this time we have our training data which has included explanations. We calculate our accuracy => POST_counterfactual

POST_Counterfactual: Network trained on X_train (includes explanations) and predictions on perturbed X_input

With these simulatability measures we get a sense of effect of explanations on neural network models (simulating humans) trained on different combined versions of input data and explanations data. The neural networks here must have same #hidden layers and their dimensions. With this we get to know more about explainer behavior on input data as well as perturbed input data giving deeper understanding of the model.

Conclusion:

We discussed about 5 evaluation metrics on which we can compare the explanations provided by different explainers. Metrics like stability, separability, similarity, explanation effect (forward) and explanation effect (counterfactual) of simulatability provides a platform to select explainers for a particular model based on certain requirements.

These evaluation measures also are helpful in selecting the right model amongst different models generated on same dataset. Reason being the idea model should make predictions by considering relevant features. As every explainer has it’s own way of explaining models, these evaluations will provide vital information useful for making further decisions.

References:

Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior?

Algorithmic approaches to interpreting machine learning models have proliferated in recent years. We carry out human…

arxiv.org