Learning to Scaffold:

Optimizing Model Explanations for Teaching

November 28th

NeurIPS 2022

Marcos Treviso*

Patrick Fernandes*

Danish Pruthi

André Martins

Graham Neubig

How should we evaluate explanations?

• Explainability methods generally do not correlate with each other

• Most explanations do not help to predict the model’s outputs and/or failures

Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)

How should we evaluate explanations?

• Explainability methods generally do not correlate with each other

• Most explanations do not help to predict the model’s outputs and/or failures

• Simulability: "can we recover the model’s output based on the explanation?"

✓ aligns with the goal of communicating the underlying model behavior

✓ is easily measurable (both manually and automatically)

✓ puts all explainability methods under a single perspective

Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)

How should we evaluate explanations?

• Explainability methods generally do not correlate with each other

• Most explanations do not help to predict the model’s outputs and/or failures

• Simulability: "can we recover the model’s output based on the explanation?"

✓ aligns with the goal of communicating the underlying model behavior

✓ is easily measurable (both manually and automatically)

✓ puts all explainability methods under a single perspective

• Pruthi et al. (2021) proposed a framework for measuring simulability that disregards trivial protocols 🥰

Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)

Simulability

(training time)

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

Simulability

(training time)

teacher

student

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

Simulability

(training time)

teacher

student

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

cross entropy

Simulability

(training time)

(test time)

\mathrm{SIM}(T, S_\theta) = \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ 1\{\, T(x) = S_\theta(x) \,\} \big]

teacher

student

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

cross entropy

Simulability

(training time)

(test time)

\mathrm{SIM}(T, S_\theta) = \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ 1\{\, T(x) = S_\theta(x) \,\} \big]

teacher

student

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

agreement

cross entropy

Simulability

Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)

(training time)

(test time)

\mathrm{SIM}(T, S_\theta) = \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ 1\{\, T(x) = S_\theta(x) \,\} \big]

teacher

student

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

cross entropy

agreement

Introducing explanations: Teacher and Student explainers \(E_T(x)\), \(E_S(x)\)

Simulability

Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)

(training time)

(test time)

\mathrm{SIM}(T, S_\theta) = \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ 1\{\, T(x) = S_\theta(x) \,\} \big]

teacher

student

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

\theta_E^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x)) \big]

cross entropy

agreement

Introducing explanations: Teacher and Student explainers \(E_T(x)\), \(E_S(x)\)

simulability loss

Simulability

Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)

(training time)

(test time)

\mathrm{SIM}(T, S_\theta) = \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ 1\{\, T(x) = S_\theta(x) \,\} \big]

teacher

student

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

cross entropy

agreement

Introducing explanations: Teacher and Student explainers \(E_T(x)\), \(E_S(x)\)

simulability loss

explainer regularizer (e.g.. KL)

Simulability

Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)

(training time)

(test time)

\mathrm{SIM}(T, S_\theta) = \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ 1\{\, T(x) = S_\theta(x) \,\} \big]

teacher

student

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

cross entropy

simulability loss

agreement

Introducing explanations: Teacher and Student explainers \(E_T(x)\), \(E_S(x)\)

explainer regularizer (e.g.. KL)

\underbrace{\hspace{11cm}}_{}

\mathrm{SIM}(T, S_{\theta^\star}) \,<\, \mathrm{SIM}(T, S_{\theta_E^\star})

(scaffolded simulability)

(standard simulability)

Simulability

Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)

(training time)

(test time)

\mathrm{SIM}(T, S_\theta) = \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ 1\{\, T(x) = S_\theta(x) \,\} \big]

teacher

student

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

cross entropy

simulability loss

agreement

Introducing explanations: Teacher and Student explainers \(E_T(x)\), \(E_S(x)\)

explainer regularizer (e.g.. KL)

\underbrace{\hspace{11cm}}_{}

\mathrm{SIM}(T, S_{\theta^\star}) \,<\, \mathrm{SIM}(T, S_{\theta_E^\star})

(scaffolded simulability)

(standard simulability)

Simulability

(training time)

(test time)

\mathrm{SIM}(T, S_\theta) = \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ 1\{\, T(x) = S_\theta(x) \,\} \big]

teacher

student

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

cross entropy

\mathrm{SIM}(T, S_{\theta^\star}) \,<\, \mathrm{SIM}(T, S_{\theta_E^\star})

(scaffolded simulability)

(standard simulability)

simulability loss

agreement

Introducing explanations: Teacher and Student explainers \(E_T(x)\), \(E_S(x)\)

explainer regularizer (e.g.. KL)

\underbrace{\hspace{11cm}}_{}

Can we learn explainers \(\phi(E)\) that optimize simulability?

(scaffolded simulability)

(optim. scaffolded simulability)

\mathrm{SIM}(T, S_{\theta_E^\star}) \,<\, \mathrm{SIM}(T, S_{\theta_{\phi(E)}})

Optimizing Explainers for Teaching

• Scaffold-Maximizing Training (SMaT) framework

\mathcal{L}_{\mathrm{student}}(x; T, E_T, S_\theta, E_S) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x))

Optimizing Explainers for Teaching

• Scaffold-Maximizing Training (SMaT) framework

\mathcal{L}_{\mathrm{student}}(x; T, E_T, S_\theta, E_S) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x))

Optimizing Explainers for Teaching

• Scaffold-Maximizing Training (SMaT) framework

\mathcal{L}_{\mathrm{student}}(x; T, E_T, S_\theta, E_S) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x))

\mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_{\phi_T}(x) \,,\, E_{\phi_S}(x))

parameterized explainers

simulability loss

Optimizing Explainers for Teaching

• Scaffold-Maximizing Training (SMaT) framework

\mathcal{L}_{\mathrm{student}}(x; T, E_T, S_\theta, E_S) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x))

\theta^\star(\phi_T), \, \phi_S^\star(\phi_T) = \argmax_{\theta, \phi_S} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) \big]

parameterized explainers

simulability loss

student parameters and student explainer parameters

• Bi-level optimization:

(inner opt.)

Optimizing Explainers for Teaching

• Scaffold-Maximizing Training (SMaT) framework

\mathcal{L}_{\mathrm{student}}(x; T, E_T, S_\theta, E_S) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x))

parameterized explainers

simulability loss

• Bi-level optimization:

teacher explainer parameters

\phi_T^\star = \argmax_{\phi_T} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ \mathcal{L}_{\mathrm{sim}}(T(x), S_{\theta^\star(\phi_T)}) \big]

(inner opt.)

(outer opt.)

student parameters and student explainer parameters

Optimizing Explainers for Teaching

• Scaffold-Maximizing Training (SMaT) framework

\mathcal{L}_{\mathrm{student}}(x; T, E_T, S_\theta, E_S) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x))

parameterized explainers

simulability loss

regularizer

• Bi-level optimization:

teacher explainer parameters

\phi_T^\star = \argmax_{\phi_T} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ \mathcal{L}_{\mathrm{sim}}(T(x), S_{\theta^\star(\phi_T)}) \big]

(inner opt.)

(outer opt.)

How can we optimize this?

student parameters and student explainer parameters

Optimizing Explainers for Teaching

• Scaffold-Maximizing Training (SMaT) framework

\mathcal{L}_{\mathrm{student}}(x; T, E_T, S_\theta, E_S) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x))

parameterized explainers

simulability loss

regularizer

• Bi-level optimization:

teacher explainer parameters

\phi_T^\star = \argmax_{\phi_T} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ \mathcal{L}_{\mathrm{sim}}(T(x), S_{\theta^\star(\phi_T)}) \big]

(inner opt.)

(outer opt.)

How can we optimize this?

• Assume the explainers are differentiable

student parameters and student explainer parameters

Optimizing Explainers for Teaching

• Scaffold-Maximizing Training (SMaT) framework

\mathcal{L}_{\mathrm{student}}(x; T, E_T, S_\theta, E_S) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x))

parameterized explainers

simulability loss

regularizer

• Bi-level optimization:

teacher explainer parameters

\phi_T^\star = \argmax_{\phi_T} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ \mathcal{L}_{\mathrm{sim}}(T(x), S_{\theta^\star(\phi_T)}) \big]

(inner opt.)

(outer opt.)

How can we optimize this?

• Assume the explainers are differentiable

• Explicit differentiation with a truncated gradient update

How can we optimize this?

• Assume the explainers are differentiable

• Explicit differentiation with a truncated gradient update

student parameters and student explainer parameters

Optimizing Explainers for Teaching

• Scaffold-Maximizing Training (SMaT) framework

\mathcal{L}_{\mathrm{student}}(x; T, E_T, S_\theta, E_S) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x))

parameterized explainers

simulability loss

regularizer

• Bi-level optimization:

teacher explainer parameters

\phi_T^\star = \argmax_{\phi_T} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ \mathcal{L}_{\mathrm{sim}}(T(x), S_{\theta^\star(\phi_T)}) \big]

(inner opt.)

(outer opt.)

How can we optimize this?

• Assume the explainers are differentiable

• Explicit differentiation with a truncated gradient update

• Diff. through a gradient operation \(\Leftrightarrow\) JAX for Hessian-vector products

student parameters and student explainer parameters

Differentiable, Parameterized Explainer

• Head-level parameterization:

Differentiable, Parameterized Explainer

• Head-level parameterization:

Differentiable, Parameterized Explainer

• Head-level parameterization:

Differentiable, Parameterized Explainer

\(\in \mathbb{R}^L\)

• Head-level parameterization:

Differentiable, Parameterized Explainer

• Head-level parameterization:

Differentiable, Parameterized Explainer

\lambda_T = \mathrm{normalize} (\phi_T) \in \triangle_{H-1}

• Head-level parameterization:

Differentiable, Parameterized Explainer

\mathrm{sparsemax}(z) = \argmin_{p\in \triangle_{H-1}}\|p - z\|_2

\lambda_T = \mathrm{normalize} (\phi_T) \in \triangle_{H-1}

From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification. Martins and Astudillo, 2016. (ICML)

• Head-level parameterization:

Contributions

\mathrm{sparsemax}(z) = \argmin_{p\in \triangle_{H-1}}\|p - z\|_2

\lambda_T = \mathrm{normalize} (\phi_T) \in \triangle_{H-1}

From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification. Martins and Astudillo, 2016. (ICML)

• Head-level parameterization:

Experiments: simulability

• Text classification (IMDB)

• Image classification (CIFAR-100)

• Machine Translation Quality Estimation (MLQE-PE)

Experiments: simulability

• Text classification (IMDB)

• Image classification (CIFAR-100)

• Machine Translation Quality Estimation (MLQE-PE)

Experiments: simulability

• Text classification (IMDB)

• Image classification (CIFAR-100)

• Machine Translation Quality Estimation (MLQE-PE)

Experiments: simulability

• Text classification (IMDB)

• Image classification (CIFAR-100)

• Machine Translation Quality Estimation (MLQE-PE)

Experiments: simulability

• Text classification (IMDB)

• Image classification (CIFAR-100)

• Machine Translation Quality Estimation (MLQE-PE)

Experiments: simulability

• Text classification (IMDB)

• Image classification (CIFAR-100)

• Machine Translation Quality Estimation (MLQE-PE)

Experiments: simulability

• Text classification (IMDB)

• Image classification (CIFAR-100)

• Machine Translation Quality Estimation (MLQE-PE)

Experiments: simulability

• Text classification (IMDB)

• Image classification (CIFAR-100)

• Machine Translation Quality Estimation (MLQE-PE)

Experiments: simulability

• Text classification (IMDB)

• Image classification (CIFAR-100)

• Machine Translation Quality Estimation (MLQE-PE)

Experiments: simulability

• Text classification (IMDB)

• Image classification (CIFAR-100)

• Machine Translation Quality Estimation (MLQE-PE)

Experiments: plausibility

• Plausiblity (human-likeness) of the explainers

Text Classification

Image Classification

Quality Estimation

Experiments: plausibility

• Plausiblity (human-likeness) of the explainers

Text Classification

Image Classification

Quality Estimation

Experiments: plausibility

• Plausiblity (human-likeness) of the explainers

Text Classification

Image Classification

Quality Estimation

Experiments: plausibility

• Plausiblity (human-likeness) of the explainers

Text Classification

Image Classification

Quality Estimation

"television"

"butterfly"

Experiments: head projection

\lambda_T = \mathrm{normalize} (\phi_T) \in \triangle_{H-1}

• Normalization functions

Experiments: head projection

\lambda_T = \mathrm{normalize} (\phi_T) \in \triangle_{H-1}

• Normalization functions

Without

Normalization

Softmax

Entmax

Sparsemax

.90

.85

.80

.75

.70

Simulability accuracy

Experiments: head projection

\lambda_T = \mathrm{normalize} (\phi_T) \in \triangle_{H-1}

• Normalization functions

• Only a small subset of attention heads are deemed relevant by SMaT

Without

Normalization

Softmax

Entmax

Sparsemax

.90

.85

.80

.75

.70

Simulability accuracy

CIFAR-100

Experiments: head projection

CIFAR-100

Experiments: head projection

Conclusions

• SMaT is a framework that optimizes explanations for teaching students

- SMaT leads to high simulability
- SMaT learns plausible explanations

• We hope this work motivates the interpretability community to consider scaffolding as valuable criterion for evaluating and designing new methods

(paper) arxiv.org/abs/2204.10810

(code) github.com/CoderPat/learning-scaffold

Introduction

• Simulability is particularly appealing for evaluating explanations

✓ aligns with the goal of communicating the underlying model behavior

✓ is easily measurable (both manually and automatically)

✓ puts all explainability methods under a single perspective

Introduction

• Simulability is particularly appealing for evaluating explanations

✓ aligns with the goal of communicating the underlying model behavior

✓ is easily measurable (both manually and automatically)

✓ puts all explainability methods under a single perspective

• Pruthi et al. (2021) proposed a framework for measuring simulability that

Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)

Introduction

• Simulability is particularly appealing for evaluating explanations

✓ aligns with the goal of communicating the underlying model behavior

✓ is easily measurable (both manually and automatically)

✓ puts all explainability methods under a single perspective

• Pruthi et al. (2021) proposed a framework for measuring simulability that

⭐️ disregards trivial protocols

Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)

Introduction

• Simulability is particularly appealing for evaluating explanations

✓ aligns with the goal of communicating the underlying model behavior

✓ is easily measurable (both manually and automatically)

✓ puts all explainability methods under a single perspective

• Pruthi et al. (2021) proposed a framework for measuring simulability that

⭐️ disregards trivial protocols

punctuation symbols  ⟹  positive
stop words           ⟹  negative

Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)

Introduction

• Simulability is particularly appealing for evaluating explanations

✓ aligns with the goal of communicating the underlying model behavior

✓ is easily measurable (both manually and automatically)

✓ puts all explainability methods under a single perspective

• Pruthi et al. (2021) proposed a framework for measuring simulability that

⭐️ disregards trivial protocols

🧶 requires an optimization procedure

Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)

punctuation symbols  ⟹  positive
stop words           ⟹  negative

How should we evaluate explanations?

How should we evaluate explanations?

How should we evaluate explanations?

Simulability

Simulability

Simulability

Simulability

Simulability

Simulability

Simulability

Simulability

Simulability

Simulability

Simulability

Optimizing Explainers for Teaching

Optimizing Explainers for Teaching

Optimizing Explainers for Teaching

Optimizing Explainers for Teaching

Optimizing Explainers for Teaching

Optimizing Explainers for Teaching

Optimizing Explainers for Teaching

Optimizing Explainers for Teaching

Optimizing Explainers for Teaching

Differentiable, Parameterized Explainer

Differentiable, Parameterized Explainer

Differentiable, Parameterized Explainer

Differentiable, Parameterized Explainer

Differentiable, Parameterized Explainer

Differentiable, Parameterized Explainer

Differentiable, Parameterized Explainer

Contributions

Experiments: simulability

Experiments: simulability

Experiments: simulability

Experiments: simulability

Experiments: simulability

Experiments: simulability

Experiments: simulability

Experiments: simulability

Experiments: simulability

Experiments: simulability

Experiments: plausibility

Experiments: plausibility

Experiments: plausibility

Experiments: plausibility

Experiments: head projection

Experiments: head projection

Experiments: head projection

Experiments: head projection

Experiments: head projection

Conclusions

Introduction

Introduction

Introduction

Introduction

Introduction

Learning to Scaffold - NeurIPS

More from mtreviso