Interpretability vs Explainability

  • Interpretability - is the extent to which a cause and effect can be observed within a system. Or, to put it another way, it is the extent to which you are able to predict what is going to happen, given a change in input or algorithmic parameters. It’s being able to look at an algorithm and go yep, I can see what’s happening here.
  • Explainability - is the extent to which the internal mechanics of a machine or deep learning system can be explained in human terms

Interpretability is about being able to discern the mechanics without necessarily knowing why. Explainability is being able to quite literally explain what is happening.

Interpretability & Explainability - Techniques and Methods

Algorithmic Generalization

Pay Attention to Feature Importance

Leave One Column Out (LOCO)

  • substitutes “missing” for a variable and recomputes the model’s prediction. The idea is that if the score changes a lot, the variable that was left out must be really important

Permutation Impact/Importance (PI)

  • substitutes a variable with a randomly selected value and recomputes the model’s prediction

Local Interpretable Model-Agnostic Explanations (LIME)

  • fits a new linear model in a local neighborhood around a given applicant’s real data values and the real model’s score for those synthetic values. It then uses this new, linear approximation of the actual model to explain how the more complex model behaves. Essentially, you’re taking a very complex model and pretending it’s simple so you can explain it

SHapley Additive exPlanations (SHAP)

  • is a game theoretic approach to explain the output of any machine learning model

Deep Learning Important Features (DeepLIFT)

Layer-wise Relevance Propagation

Interpretability & Explainability - Problems

  • interpretability and explainability add an additional step to the developmental process

Interpretability & Explainability - Resources