Foundations of Interpretable AI
Conference on Parsimony and Learning 2025
CVPR 2025
Part I: Motivations and Post-hoc (9-9:45)
Part II: Shapley values (9:45 - 10:30)
Part III: Interpretable by Design (11 - 11:45)
Popularity on
Popularity on
What are they?
How are they computed?
(Shapley for local feature importance)
Lloyd S Shapley. A value for n-person games. Contributions to the Theory of Games, 2(28):307–317, 1953.
Let be an -person cooperative game with characteristic function
How important is each player for the outcome of the game?
Lloyd S Shapley. A value for n-person games. Contributions to the Theory of Games, 2(28):307–317, 1953.
lung opacity
cardiomegaly
fracture
no findding
inputs
responses
predictor
Question 2:
How can we (when) compute \(\phi_i(v)\)?
Question 3:
What do \(\phi_i(v)\) say (and don't) about the problem?
inputs
responses
predictor
Question 1:
How should (can) we choose the function \(v\)?
Question 2:
How can we (when) compute \(\phi_i(v)\)?
Question 3:
What do \(\phi_i(v)\) say (and don't) about the problem?
inputs
responses
predictor
Question 1:
How should (can) we choose the function \(v\)?
Question 1:
How should (can) we choose the function \(v\)?
For any \(S \subseteq [n]\), and a sample \(x\sim p_X\), we need
\(v_f(S,x) : \mathcal P([n])\times \mathcal X \to \mathbb R\)
[Chen et al, Algorithms to estimate Shapley value feature attributions, 2022]
[Lundberg and Lee, 2017] [Strumbelj & Kononenko, 2014] [Datta el at, 2016]
Question 1:
How should (can) we choose the function \(v\)?
[Chen et al, Algorithms to estimate Shapley value feature attributions, 2022]
[Lundberg and Lee, 2017] [Strumbelj & Kononenko, 2014] [Datta el at, 2016]
\(v_f(S,x) = f(x_S,\mathbb x^{b}_{\bar{S}})\)
Question 1:
How should (can) we choose the function \(v\)?
[Chen et al, Algorithms to estimate Shapley value feature attributions, 2022]
[Lundberg and Lee, 2017] [Strumbelj & Kononenko, 2014] [Datta el at, 2016]
\(v_f(S,x) = f(x_S,\mathbb x^{b}_{\bar{S}})\)
Easy, cheap
\( (x_S,x^b_{\bar{S}})\not\sim p_X\)
Question 1:
How should (can) we choose the function \(v\)?
\(v_f(S,x) = \mathbb{E} [f(x_S,\tilde{X}_{\bar{S}})|X_S = x_S]\)
[Chen et al, Algorithms to estimate Shapley value feature attributions, 2022] [Aas et al, 2019] [Teneggi et al, 2023][Frye et al, 2021][Janzing et al, 2019][Chen et al, 2020]
Question 1:
How should (can) we choose the function \(v\)?
\(v_f(S,x) = \mathbb{E} [f(x_S,\tilde{X}_{\bar{S}})|X_S = x_S]\)
\((x_S,\tilde{X}_{\bar{S}})\)
\( (x_S,\tilde{X}_{\bar{S}})\sim p_X\)
"True to the data"
[Chen et al, Algorithms to estimate Shapley value feature attributions, 2022] [Aas et al, 2019] [Teneggi et al, 2023][Frye et al, 2021][Janzing et al, 2019][Chen et al, 2020]
Question 1:
How should (can) we choose the function \(v\)?
\(v_f(S,x) = \mathbb{E} [f(x_S,\tilde{X}_{\bar{S}})|X_S = x_S]\)
\( (x_S,\tilde{X}_{\bar{S}})\sim p_X\)
"True to the data"
[Chen et al, Algorithms to estimate Shapley value feature attributions, 2022] [Aas et al, 2019] [Teneggi et al, 2023][Frye et al, 2021][Janzing et al, 2019][Chen et al, 2020]
Question 1:
How should (can) we choose the function \(v\)?
\(v_f(S,x) = \mathbb{E} [f(x_S,\tilde{X}_{\bar{S}})|X_S = x_S]\)
\( (x_S,\tilde{X}_{\bar{S}})\sim p_X\)
"True to the data"
Alternative: learn a model \(g_\theta\) for the conditional expectation
\(v_f(S,x) = \mathbb{E} [f(x_S,\tilde{X}_{\bar{S}})|X_S = x_S] \approx g_\theta (x,S)\)
[Frye et al, 2021]
[Chen et al, Algorithms to estimate Shapley value feature attributions, 2022] [Aas et al, 2019] [Teneggi et al, 2023][Frye et al, 2021][Janzing et al, 2019][Chen et al, 2020]
Question 1:
How should (can) we choose the function \(v\)?
\(v_f(S,x) = \mathbb{E} [f(x_S,\tilde{X}_{\bar{S}})]\)
\( (x_S,\tilde{X}_{\bar{S}})\not\sim p_X\)
except if features independent
[Chen et al, Algorithms to estimate Shapley value feature attributions, 2022] [Aas et al, 2019] [Lundberg & Lee, 2017][Frye et al, 2021][Janzing et al, 2019][Chen et al, 2020]
Question 1:
How should (can) we choose the function \(v\)?
Linear model (approximation)
\(v_f(S,x) = \mathbb{E} [f(x_S,\tilde{X}_{\bar{S}})] \approx f(x_S,\mathbb{E}[\tilde{X}_{\bar{S}}])\)
\( (x_S,\tilde{X}_{\bar{S}})\not\sim p_X\)
except in linear models
(and feature independence)
Easiest, popular in practice
[Chen et al, Algorithms to estimate Shapley value feature attributions, 2022]
[Aas et al, 2019] [Lundberg & Lee, 2017][Frye et al, 2021][Janzing et al, 2019][Chen et al, 2020]
inputs
responses
predictor
Question 1:
How should (can) we choose the function \(v\)?
Question 2:
How can we (when) compute \(\phi_i(v)\)?
Question 3:
What do \(\phi_i(v)\) say (and don't) about the problem?
Question 2:
How can we (when) compute \(\phi_i(v)\)?
[Lundberg and Lee, 2017] [Strumbelj & Kononenko, 2014] [Datta el at, 2016]
Weighted Least Squares (kernelSHAP )
Monte Carlo Sampling
[Jethani et al, 2021]
Weighted Least Squares (kernelSHAP )
Weighted Least Squares, amortized ( FastSHAP )
... and stochastic versions [Covert et al, 2024]
Question 2:
How can we (when) compute \(\phi_i(v)\)?
Question 2:
How can we (when) compute \(\phi_i(v)\) if we know more?
(about the model)
[Lundberg and Lee, 2017] [Strumbelj & Kononenko, 2014][Chen et al, 2020]
Linear models \(f(x) = \beta^\top x \)
Closed-form expressions (for marginal distributions and baselines)
\( \phi_i(f,x) = \beta_i (x_i-\mu_i ) \)
(also for conditional if assuming Gaussian features)
Tree models
[Lundberg et al, 2020]
Polynomial time algorithm (exact) (TreeSHAP) for \(\phi_i(f)\)
\(\mathcal O(N_\text{trees}N_\text{leaves} \text{Depth}^2)\)
Question 2:
How can we (when) compute \(\phi_i(v)\) if we know more?
Local models
(about the model)
[Chen et al, 2019]
Observation: Restrict computation of \(\phi_i(f)\) to local areas of influence given by a graph structure
L-Shap
C-Shap
\(\Rightarrow\) complexity \(\mathcal O(2^k n)\)
Question 2:
How can we (when) compute \(\phi_i(v)\) if we know more?
Local models
(about the model)
[Chen et al, 2019]
Observation: Restrict computation of \(\phi_i(f)\) to local areas of influence given by a graph structure
\(\Rightarrow\) complexity \(\mathcal O(2^k n)\)
Correct approximations (informal statement)
Let \(S\subset \mathcal N_k(i)\). If \((X_i \perp\!\!\!\perp X_{[n]\setminus S} | X_T) \) and \((X_i \perp\!\!\!\perp X_{[n]\setminus S} | X_T,Y) \) for any \(T\subset S\setminus i\).
Then \(\hat{\phi}^k_i(v) = \phi_i(v)\)
(and approximately bounded otherwise, controlled)
Question 2:
How can we (when) compute \(\phi_i(v)\) if we know more?
Local models
(about the model)
[Chen et al, 2019]
Observation: Restrict computation of \(\phi_i(f)\) to local areas of influence given by a graph structure
\(\Rightarrow\) complexity \(\mathcal O(2^k n)\)
Question 2:
How can we (when) compute \(\phi_i(v)\) if we know more?
Hierarchical Shapley (h-Shap)
(about the model)
[Teneggi et al, 2022]
Observation: \(f(x) = 1 \Leftrightarrow \exist~ i: f(x_i,\tilde{X}_{-i}) = 1\) (A1)
Example:
if contains a sick cell
Question 2:
How can we (when) compute \(\phi_i(v)\) if we know more?
(about the model)
Observation: \(f(x) = 1 \Leftrightarrow \exist~ i: f(x_i,\tilde{X}_{-i}) = 1\) (A1)
Hierarchical Shapley (h-Shap)
[Teneggi et al, 2022]
Under A1, \(\phi^\text{h-Shap}_i(f) = \phi_i(f)\)
Bounded approximation as deviating from A1
2. Correct approximation (informal)
1. Complexity \(\mathcal O(2^\gamma k \log n)\)
Question 2:
How can we (when) compute \(\phi_i(v)\) if we know more?
(about the model)
1. Complexity \(\mathcal O(2^\gamma k \log n)\)
Under A1, \(\phi^\text{h-Shap}_i(f) = \phi_i(f)\)
Bounded approximation as deviating from A1
Observation: \(f(x) = 1 \Leftrightarrow \exist~ i: f(x_i,\tilde{X}_{-i}) = 1\) (A1)
2. Correct approximation (informal)
Hierarchical Shapley (h-Shap)
[Teneggi et al, 2022]
Question 2:
How can we (when) compute \(\phi_i(v)\) if we know more?
Shapley Approximations for Deep Models
(about the model)
DeepLift (Shrikumar et al, 2017): biased estimation of baseline Shap
DeepShap (Chen et al, 2021): biased estimation of marginal Shap
DASP (Ancona et al, 2019) Uncertainty propagation for baseline (zero) Shap
assuming Gaussianity and independence of features
Shapnets [Wang et al, 2020]: Computation for small-width networks
... not an exhaustive list!
Transformers (ViTs) [Covert et al, 2023]: leveraging attention to fine-tune a surrogate model for Shap estimation
Question 2:
How can we (when) compute \(\phi_i(v)\)?
Question 3:
What do \(\phi_i(v)\) say (and don't) about the problem?
inputs
responses
predictor
Question 1:
How should (can) we choose the function \(v\)?
Question 3:
What do \(\phi_i(v)\) say (and don't) about the problem?
Interpretability as Conditional Independence
Explaining uncertainty via Shapley Values [Watson et al, 2023]
with \(v_\text{KL}(S,x) = -D_\text{KL}(~p_{Y|x} ~||~ p_{Y|x_s}~)\)
Theorem (informal)
\(Y \perp\!\!\!\perp X_j | X_s = x_s ~~\Rightarrow~~ v_\text{KL}(S\cup\{i\},x) - v_\text{KL}(S,x) = 0\)
Question 3:
What do \(\phi_i(v)\) say (and don't) about the problem?
Interpretability as Conditional Independence
SHAP-XRT: Shapley meets Hypothesis Testing [Teneggi et al, 2023]
\(\hat{p}_{i,S} \leftarrow \texttt{XRT}: \text{eXplanation Randomization Test}\), via access to \({X}_{\bar{S}} \sim p_{X_{\bar{S}}|x_s}\)
Theorem (informal)
For \(f:\mathcal X \to [0,1], ~~ \mathbb E [\hat{p}_{i,S}]\leq 1- \mathbb E [v(S\cup i) - v(S)] \)
large \(\mathbb E [v(S\cup i) - v(S)] ~ \Rightarrow \) reject \(H^0_{i,S}\)
Foundations of Interpretable AI
Conference on Parsimony and Learning 2025
Is the piano important for \(\hat Y = \text{cat}\)?
How can we explain black-box predictors with semantic features?
Is the piano important for \(\hat Y = \text{cat}\), given that there is a cute mammal in the image?
Is the presence of \(\color{Blue}\texttt{edema}\) important for \(\hat Y = \text{lung opacity}\)?
How can we explain black-box predictors with semantic features?
Is the presence of \(\color{magenta}\texttt{devices}\) important for \(\hat Y = \texttt{lung opacity}\), given that there is \(\color{blue}\texttt{edema}\) in the image?
lung opacity
cardiomegaly
fracture
no findding
Concept Bank: \(C = [c_1, c_2, \dots, c_m] \in \mathbb R^{d\times m}\)
Embeddings: \(H = f(X) \in \mathbb R^d\)
Semantics: \(Z = C^\top H \in \mathbb R^m\)
Concept Bank: \(C = [c_1, c_2, \dots, c_m] \in \mathbb R^{d\times m}\)
Concept Activation Vectors
(Kim et al, 2018)
\(c_\text{cute}\)
Vision-language models
(CLIP, BLIP, etc... )
[Bhalla et al, "Splice", 2024]
[Koh et al '20, Yang et al '23, Yuan et al '22 ]
Need to engineer a (large) concept bank
Performance hit w.r.t. original predictor
\(\tilde{Y} = \hat w^\top Z\)
\(\hat w_j\) is the importance of the \(j^{th}\) concept
Desiderata
Fixed original predictor (post-hoc)
Global and local importance notions
Testing for any concepts (no need for large concept banks)
Precise testing with guarantees (Type 1 error/FDR control)
\(C = \{\text{``cute''}, \text{``whiskers''}, \dots \}\)
Global Importance
\(H^G_{0,j} : \hat{Y} \perp\!\!\!\perp Z_j \)
Global Conditional Importance
\(H^{GC}_{0,j} : \hat{Y} \perp\!\!\!\perp Z_j | Z_{-j}\)
Global Importance
\(C = \{\text{``cute''}, \text{``whiskers''}, \dots \}\)
\(H^G_{0,j} : g(f(X)) \perp\!\!\!\perp c_j^\top f(X) \)
Global Conditional Importance
\(H^{GC}_{0,j} : g(f(X)) \perp\!\!\!\perp c_j^\top f(X) | C_{-j}^\top f(X)\)
\(H^G_{0,j} : \hat{Y} \perp\!\!\!\perp Z_j \)
\(H^{GC}_{0,j} : \hat{Y} \perp\!\!\!\perp Z_j | Z_{-j}\)
"The classifier (its distribution) does not change if we condition
on concepts \(S\) vs on concepts \(S\cup\{j\} \)"
\(C = \{\text{``cute''}, \text{``whiskers''}, \dots \}\)
Local Conditional Importance
\[H^{j,S}_0:~ g({\tilde H_{S \cup \{j\}}}) \overset{d}{=} g(\tilde H_S), \qquad \tilde H_S \sim P_{H|Z_S = C_S^\top f(x)} \]
"The classifier (its distribution) does not change if we condition
on concepts \(S\) vs on concepts \(S\cup\{j\} \)"
\(\hat{Y}_\text{gas pump}\)
\(Z_S\cup Z_{j}\)
\(Z_{S}\)
\(Z_j=\)
Local Conditional Importance
\[H^{j,S}_0:~ g({\tilde H_{S \cup \{j\}}}) \overset{d}{=} g(\tilde H_S), \qquad \tilde H_S \sim P_{H|Z_S = C_S^\top f(x)} \]
"The classifier (its distribution) does not change if we condition
on concepts \(S\) vs on concepts \(S\cup\{j\} \)"
\(\hat{Y}_\text{gas pump}\)
\(\hat{Y}_\text{gas pump}\)
\(Z_S\cup Z_{j}\)
\(Z_{S}\)
\(Z_S\cup Z_{j}\)
\(Z_{S}\)
Local Conditional Importance
\(Z_j=\)
\(Z_j=\)
\[H^{j,S}_0:~ g({\tilde H_{S \cup \{j\}}}) \overset{d}{=} g(\tilde H_S), \qquad \tilde H_S \sim P_{H|Z_S = C_S^\top f(x)} \]
\(H^G_{0,j} : \hat{Y} \perp\!\!\!\perp Z_j \iff P_{\hat{Y},Z_j} = P_{\hat{Y}} \times P_{Z_j}\)
Testing importance via two-sample tests
\(H^{GC}_{0,j} : \hat{Y} \perp\!\!\!\perp Z_j | Z_{-j} \iff P_{\hat{Y}Z_jZ_{-j}} = P_{\hat{Y}\tilde{Z}_j{Z_{-j}}}\)
\(\tilde{Z_j} \sim P_{Z_j|Z_{-j}}\)
[Shaer et al, 2023]
[Teneggi et al, 2023]
\[H^{j,S}_0:~ g({\tilde H_{S \cup \{j\}}}) \overset{d}{=} g(\tilde H_S), \qquad \tilde H_S \sim P_{H|Z_S = C_S^\top f(x)} \]
Goal: Test a null hypothesis \(H_0\) at significance level \(\alpha\)
Standard testing by p-values
Collect data, then test, and reject if \(p \leq \alpha\)
Online testing by e-values
Any-time valid inference, monitor online and reject when \(e\geq 1/\alpha\)
[Shaer et al. 2023, Shekhar and Ramdas 2023, Podkopaev et al 2023]
Consider a wealth process
\(K_0 = 1;\)
\(\text{for}~ t = 1, \dots \\ \)
Online testing by e-values
Fair game: \(~~\mathbb E_{H_0}[\kappa_t | \text{Everything seen}_{t-1}] = 0\)
\(v_t \in (0,1):\) betting fraction
\(\kappa_t \in [-1,1]\) payoff
\( K_t = K_{t-1}(1+\kappa_t v_t)\)
[Shaer et al. 2023, Shekhar and Ramdas 2023, Podkopaev et al 2023]
Lemma: For a fair game, \(\mathbb P_{H_0}[\exists t \in \mathbb N : K_t \geq 1/\alpha ]\leq\alpha\)
Online testing by e-values
\(v_t \in (0,1):\) betting fraction
\(H_0: ~ P = Q\)
\(\kappa_t = \text{tahn}({\color{teal}\rho(X_t)} - {\color{teal}\rho(Y_t)})\)
Payoff function
\({\color{black}\text{MMD}(P,Q)} : \text{ Maximum Mean Discrepancy}\)
\({\color{teal}\rho} = \underset{\rho\in \mathcal R:\|\rho\|_\mathcal R\leq 1}{\arg\sup} ~\mathbb E_P [\rho(X)] - \mathbb E_Q[\rho(Y)]\)
\( K_t = K_{t-1}(1+\kappa_t v_t)\)
Data efficient
Rank induced by rejection time
[Shaer et al. 2023, Shekhar and Ramdas 2023, Podkopaev et al 2023]
rejection time
rejection rate
Important Semantic Concepts
(Reject \(H_0\))
Unimportant Semantic Concepts
(fail to reject \(H_0\))
Type 1 error control
False discovery rate control
What concepts does BiomedVLP find important to predict ?
lung opacity
Hemorrhage
No Hemorrhage
Hemorrhage
Hemorrhage
intraparenchymal
subdural
subarachnoid
intraventricular
epidural
intraparenchymal
subarachnoid
intraventricular
epidural
subdural
intraparenchymal
subarachnoid
subdural
epidural
intraventricular
intraparenchymal
subarachnoid
intraventricular
epidural
subdural
(+)
(-)
(-)
(-)
(-)
(+)
(-)
(+)
(-)
(-)
(+)
(+)
(-)
(-)
(-)
(-)
(-)
(-)
(-)
(-)
Global Importance
Global Conditional Importance
Semantic comparison of vision-language models
Question 1)
Can we resolve the computational bottleneck (and when)?
Question 2)
What do these coefficients mean statistically?
Question 3)
How to go beyond input-features explanations?
Distributional assumptions + hierarchical extensions
Allow us to conclude on differences in distributions
Use online testing by betting for semantic concepts
Jacopo Teneggi
JHU
Beepul Bharti
JHU
Teneggi et al, SHAP-XRT: The Shapley Value Meets Conditional Independence Testing, TMLR (2023).
Teneggi et al, Fast hierarchical games for image explanations, Teneggi, Luster & S., IEEE TPAMI (2022)
Teneggi & S., Testing Semantic Importance via Betting, Neurips (2024).
Yaniv Romano Technion