Why Should I Trust You?

Image Credit: Interpretability ML Book

Image Credit: LIME Video

Areas of Interpretability

Algorithm Transparency

Scope of interpretation
- Local Interpretation
- Global Interpretation

Model restriction
- Blackbox models
- Restricted models

Local

Intereptable

Model-Agnostic

Explanations

How does LIME work?

Objective Function Formulazition

\text{explanation(x)} = \text{argmin}_{g \in \mathcal{G}} L(f, g, \pi_x) + \Omega(g)

\Omega(g) \to \text{Complexity of model } g

\mathcal{G} \to \text{set of all possible models}

\pi_x \to \text{a neighbourhood of } x

L \to \text{ Loss function for approximation}

Proximity Kernel Definition

Some LIME Samples

Interpretability Via Model Extraction*

* Bastani O, Kim C, Bastani H. Interpretability via model extraction. arXiv preprint arXiv:1706.09773. 2017 Jun 29.

x \in \mathcal{X}

f

y \in \mathcal{Y}

x \in \mathcal{X}

T

T(x) \approx y

Algorithm Properties

Global

Blackbox

T as a Decision Tree

But Decision Tree Overfits!

Solution: Use Active Learning

The more samples used

The more generalized the tree

Model Extraction Algorithm

1. Input Distribution

\text{General Gaussian Mixture Model}

+

X_{train} \text{ dataset used for training } f

EM

\mathcal{P} \text{ distribution}

2. Exact Decision Tree

\text{Define } T^* \text{ such that } \\ \forall x; T^*(x) = f(x)

Gain(f,C_N) = \\ 1 - \sum_{y \sim \mathcal{Y}}{Pr_{x \sim P}[f(x) = y \, | \, C_N]}

3. Estimated DT

\text{Approximate } Gain(f, C_N) \text{ using n } i.i.d \text{ samples}

\text{Simplify } C_N = (s_1 \le x_1 \le t_1) \cdots (s_d \le x_d \le t_d)

pdf_{\mathcal{P} \, | \, C_N}(x) \varpropto \sum_{j=1}^{k}{\phi_j \prod_{i=1}^{d}{\mathcal{P}_{\mathcal{N}(\mu_{ji}, \sigma_{ji})|(s_i \le x \le t_i)}(x_i)}}

3. Estimated DT Cont.

\text{Using } pdf_{\mathcal{P} \, | \, C_N}(x) \\ \text{ we can calculate probability of each component of } \mathcal{P}

x \sim \mathcal{P} \, | \, C_N \\ (j \sim Categorical(\tilde{\phi})) \wedge (x \sim \mathcal{N}(\mu_{ji}, \sigma_{ji}) \, | \, (s_i \le x_i \le t_i))

4. Theoretical Gaurentee

Pr_{x \sim \mathcal{P}}{[T(x) \ne T^*(x)]} \le \epsilon \text{ has probability at least } 1-\delta

\text{As } n \to \infty \text{, } T \to T^* \text{. In other words, for all } \epsilon , \delta:

Comparision with CART

Proposed Algorithm: Adding tree interpreter

Motivation:
- Instant explanations
- Local explanations

Method:
- Feature importance measure
- On Each leaf!

Adding tree interpreter

Result:
- Learning DTExtract as Interpreter model
- Interpreting local instances based on destination leaf in

O(1)

Visualizing Results

Movies dataset: Positive or Negative Comment?

Visualizing Results

Newsgroups dataset: Christian or Atheism?

Implementation

Thanks!

Here's the potato

Why Should I Trust You?

By Amin Mohamadi

Why Should I Trust You?

399

Amin Mohamadi