
Why Should I Trust You?
Image Credit: Interpretability ML Book

Image Credit: LIME Video




Areas of Interpretability
-
Algorithm Transparency
-
Scope of interpretation
- Local Interpretation
- Global Interpretation
-
Model restriction
- Blackbox models
- Restricted models
Local
Intereptable
Model-Agnostic
Explanations
How does LIME work?




Objective Function Formulazition
\text{explanation(x)} = \text{argmin}_{g \in \mathcal{G}} L(f, g, \pi_x) + \Omega(g)
\Omega(g) \to \text{Complexity of model } g
\mathcal{G} \to \text{set of all possible models}
\pi_x \to \text{a neighbourhood of } x
L \to \text{ Loss function for approximation}

Proximity Kernel Definition
Some LIME Samples


Interpretability Via Model Extraction*
* Bastani O, Kim C, Bastani H. Interpretability via model extraction. arXiv preprint arXiv:1706.09773. 2017 Jun 29.
x \in \mathcal{X}
f
y \in \mathcal{Y}
x \in \mathcal{X}
T
T(x) \approx y
Algorithm Properties
Global
Blackbox
T as a Decision Tree
But Decision Tree Overfits!
Solution: Use Active Learning
The more samples used
The more generalized the tree
Model Extraction Algorithm
1. Input Distribution
\text{General Gaussian Mixture Model}
+
X_{train} \text{ dataset used for training } f
EM
\mathcal{P} \text{ distribution}
2. Exact Decision Tree
\text{Define } T^* \text{ such that } \\
\forall x; T^*(x) = f(x)
Gain(f,C_N) = \\ 1 - \sum_{y \sim \mathcal{Y}}{Pr_{x \sim P}[f(x) = y \, | \, C_N]}
3. Estimated DT
\text{Approximate } Gain(f, C_N) \text{ using n } i.i.d \text{ samples}
\text{Simplify } C_N = (s_1 \le x_1 \le t_1) \cdots (s_d \le x_d \le t_d)
pdf_{\mathcal{P} \, | \, C_N}(x) \varpropto \sum_{j=1}^{k}{\phi_j
\prod_{i=1}^{d}{\mathcal{P}_{\mathcal{N}(\mu_{ji}, \sigma_{ji})|(s_i \le x \le t_i)}(x_i)}}
3. Estimated DT Cont.
\text{Using } pdf_{\mathcal{P} \, | \, C_N}(x) \\ \text{ we can calculate probability
of each component of } \mathcal{P}
x \sim \mathcal{P} \, | \, C_N \\ (j \sim Categorical(\tilde{\phi})) \wedge (x \sim \mathcal{N}(\mu_{ji}, \sigma_{ji}) \, | \, (s_i \le x_i \le t_i))
4. Theoretical Gaurentee
Pr_{x \sim \mathcal{P}}{[T(x) \ne T^*(x)]} \le \epsilon \text{ has probability at least } 1-\delta
\text{As } n \to \infty \text{, } T \to T^* \text{. In other words, for all } \epsilon , \delta:
Comparision with CART

Proposed Algorithm: Adding tree interpreter
-
Motivation:
- Instant explanations
- Local explanations
-
Method:
- Feature importance measure
- On Each leaf!
Adding tree interpreter
-
Result:
- Learning DTExtract as Interpreter model
- Interpreting local instances based on destination leaf in
O(1)
Visualizing Results
Movies dataset: Positive or Negative Comment?


Visualizing Results

Newsgroups dataset: Christian or Atheism?
Implementation




Thanks!
Here's the potato

Why Should I Trust You?
By Amin Mohamadi
Why Should I Trust You?
- 357