Learning to Scaffold: 

Optimizing Model Explanations for Teaching

November 28th

NeurIPS 2022

Marcos Treviso*

Patrick Fernandes*

Danish Pruthi

André Martins

Graham Neubig

How should we evaluate explanations?

•   Explainability methods generally do not correlate with each other

•   Most explanations do not help to predict the model’s outputs and/or failures

Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)

How should we evaluate explanations?

•   Explainability methods generally do not correlate with each other

•   Most explanations do not help to predict the model’s outputs and/or failures

 

•   Simulability: "can we recover the model’s output based on the explanation?"

        ✓  aligns with the goal of communicating the underlying model behavior

          is easily measurable (both manually and automatically)

          puts all explainability methods under a single perspective

 

Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)

How should we evaluate explanations?

•   Explainability methods generally do not correlate with each other

•   Most explanations do not help to predict the model’s outputs and/or failures

 

•   Simulability: "can we recover the model’s output based on the explanation?"

        ✓  aligns with the goal of communicating the underlying model behavior

          is easily measurable (both manually and automatically)

          puts all explainability methods under a single perspective

 

•   Pruthi et al. (2021) proposed a framework for measuring simulability that disregards trivial protocols 🥰

Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)

Simulability

(training time)

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

Simulability

(training time)

teacher

student

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

Simulability

(training time)

teacher

student

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

cross entropy

Simulability

(training time)

(test time)

\mathrm{SIM}(T, S_\theta) = \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ 1\{\, T(x) = S_\theta(x) \,\} \big]

teacher

student

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

cross entropy

Simulability

(training time)

(test time)

\mathrm{SIM}(T, S_\theta) = \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ 1\{\, T(x) = S_\theta(x) \,\} \big]

teacher

student

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

agreement

cross entropy

Simulability

Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)

(training time)

(test time)

\mathrm{SIM}(T, S_\theta) = \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ 1\{\, T(x) = S_\theta(x) \,\} \big]

teacher

student

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

cross entropy

agreement

   Introducing explanations: Teacher and Student explainers \(E_T(x)\), \(E_S(x)\)

Simulability

Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)

(training time)

(test time)

\mathrm{SIM}(T, S_\theta) = \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ 1\{\, T(x) = S_\theta(x) \,\} \big]

teacher

student

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]
\theta_E^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x)) \big]

cross entropy

agreement

   Introducing explanations: Teacher and Student explainers \(E_T(x)\), \(E_S(x)\)

simulability loss

Simulability

Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)

(training time)

(test time)

\mathrm{SIM}(T, S_\theta) = \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ 1\{\, T(x) = S_\theta(x) \,\} \big]

teacher

student

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]
\theta_E^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x)) \big]

cross entropy

agreement

   Introducing explanations: Teacher and Student explainers \(E_T(x)\), \(E_S(x)\)

simulability loss

explainer regularizer (e.g.. KL)

Simulability

Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)

(training time)

(test time)

\mathrm{SIM}(T, S_\theta) = \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ 1\{\, T(x) = S_\theta(x) \,\} \big]

teacher

student

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]
\theta_E^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x)) \big]

cross entropy

simulability loss

agreement

   Introducing explanations: Teacher and Student explainers \(E_T(x)\), \(E_S(x)\)

explainer regularizer (e.g.. KL)

\underbrace{\hspace{11cm}}_{}
\mathrm{SIM}(T, S_{\theta^\star}) \,<\, \mathrm{SIM}(T, S_{\theta_E^\star})

(scaffolded simulability)

(standard simulability)

Simulability

Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)

(training time)

(test time)

\mathrm{SIM}(T, S_\theta) = \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ 1\{\, T(x) = S_\theta(x) \,\} \big]

teacher

student

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]
\theta_E^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x)) \big]

cross entropy

simulability loss

agreement

   Introducing explanations: Teacher and Student explainers \(E_T(x)\), \(E_S(x)\)

explainer regularizer (e.g.. KL)

\underbrace{\hspace{11cm}}_{}
\mathrm{SIM}(T, S_{\theta^\star}) \,<\, \mathrm{SIM}(T, S_{\theta_E^\star})

(scaffolded simulability)

(standard simulability)

Simulability

(training time)

(test time)

\mathrm{SIM}(T, S_\theta) = \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ 1\{\, T(x) = S_\theta(x) \,\} \big]

teacher

student

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]
\theta_E^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x)) \big]

cross entropy

\mathrm{SIM}(T, S_{\theta^\star}) \,<\, \mathrm{SIM}(T, S_{\theta_E^\star})

(scaffolded simulability)

(standard simulability)

simulability loss

agreement

   Introducing explanations: Teacher and Student explainers \(E_T(x)\), \(E_S(x)\)

explainer regularizer (e.g.. KL)

\underbrace{\hspace{11cm}}_{}

Can we learn explainers \(\phi(E)\) that optimize simulability?

(scaffolded simulability)

(optim. scaffolded simulability)

\mathrm{SIM}(T, S_{\theta_E^\star}) \,<\, \mathrm{SIM}(T, S_{\theta_{\phi(E)}})

Optimizing Explainers for Teaching

•   Scaffold-Maximizing Training (SMaT) framework

\mathcal{L}_{\mathrm{student}}(x; T, E_T, S_\theta, E_S) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x))

Optimizing Explainers for Teaching

•   Scaffold-Maximizing Training (SMaT) framework

\mathcal{L}_{\mathrm{student}}(x; T, E_T, S_\theta, E_S) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x))

Optimizing Explainers for Teaching

•   Scaffold-Maximizing Training (SMaT) framework

\mathcal{L}_{\mathrm{student}}(x; T, E_T, S_\theta, E_S) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x))
\mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_{\phi_T}(x) \,,\, E_{\phi_S}(x))

parameterized​ explainers

simulability loss

Optimizing Explainers for Teaching

•   Scaffold-Maximizing Training (SMaT) framework

\mathcal{L}_{\mathrm{student}}(x; T, E_T, S_\theta, E_S) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x))
\mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_{\phi_T}(x) \,,\, E_{\phi_S}(x))
\theta^\star(\phi_T), \, \phi_S^\star(\phi_T) = \argmax_{\theta, \phi_S} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) \big]

parameterized​ explainers

simulability loss

student parameters and student explainer parameters

•   Bi-level optimization:

(inner opt.)

Optimizing Explainers for Teaching

•   Scaffold-Maximizing Training (SMaT) framework

\mathcal{L}_{\mathrm{student}}(x; T, E_T, S_\theta, E_S) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x))
\mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_{\phi_T}(x) \,,\, E_{\phi_S}(x))
\theta^\star(\phi_T), \, \phi_S^\star(\phi_T) = \argmax_{\theta, \phi_S} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) \big]

parameterized​ explainers

simulability loss

•   Bi-level optimization:

teacher explainer parameters

\phi_T^\star = \argmax_{\phi_T} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ \mathcal{L}_{\mathrm{sim}}(T(x), S_{\theta^\star(\phi_T)}) \big]

(inner opt.)

(outer opt.)

student parameters and student explainer parameters

Optimizing Explainers for Teaching

•   Scaffold-Maximizing Training (SMaT) framework

\mathcal{L}_{\mathrm{student}}(x; T, E_T, S_\theta, E_S) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x))
\mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_{\phi_T}(x) \,,\, E_{\phi_S}(x))
\theta^\star(\phi_T), \, \phi_S^\star(\phi_T) = \argmax_{\theta, \phi_S} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) \big]

parameterized​ explainers

simulability loss

regularizer

•   Bi-level optimization:

teacher explainer parameters

\phi_T^\star = \argmax_{\phi_T} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ \mathcal{L}_{\mathrm{sim}}(T(x), S_{\theta^\star(\phi_T)}) \big]

(inner opt.)

(outer opt.)

How can we optimize this?

student parameters and student explainer parameters

Optimizing Explainers for Teaching

•   Scaffold-Maximizing Training (SMaT) framework

\mathcal{L}_{\mathrm{student}}(x; T, E_T, S_\theta, E_S) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x))
\mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_{\phi_T}(x) \,,\, E_{\phi_S}(x))
\theta^\star(\phi_T), \, \phi_S^\star(\phi_T) = \argmax_{\theta, \phi_S} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) \big]

parameterized​ explainers

simulability loss

regularizer

•   Bi-level optimization:

teacher explainer parameters

\phi_T^\star = \argmax_{\phi_T} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ \mathcal{L}_{\mathrm{sim}}(T(x), S_{\theta^\star(\phi_T)}) \big]

(inner opt.)

(outer opt.)

How can we optimize this?

•    Assume the explainers are differentiable

 

student parameters and student explainer parameters

Optimizing Explainers for Teaching

•   Scaffold-Maximizing Training (SMaT) framework

\mathcal{L}_{\mathrm{student}}(x; T, E_T, S_\theta, E_S) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x))
\mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_{\phi_T}(x) \,,\, E_{\phi_S}(x))
\theta^\star(\phi_T), \, \phi_S^\star(\phi_T) = \argmax_{\theta, \phi_S} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) \big]

parameterized​ explainers

simulability loss

regularizer

•   Bi-level optimization:

teacher explainer parameters

\phi_T^\star = \argmax_{\phi_T} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ \mathcal{L}_{\mathrm{sim}}(T(x), S_{\theta^\star(\phi_T)}) \big]

(inner opt.)

(outer opt.)

How can we optimize this?

•    Assume the explainers are differentiable

•    Explicit differentiation with a truncated gradient update

 

How can we optimize this?

•    Assume the explainers are differentiable

•    Explicit differentiation with a truncated gradient update

 

student parameters and student explainer parameters

Optimizing Explainers for Teaching

•   Scaffold-Maximizing Training (SMaT) framework

\mathcal{L}_{\mathrm{student}}(x; T, E_T, S_\theta, E_S) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x))
\mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_{\phi_T}(x) \,,\, E_{\phi_S}(x))
\theta^\star(\phi_T), \, \phi_S^\star(\phi_T) = \argmax_{\theta, \phi_S} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) \big]

parameterized​ explainers

simulability loss

regularizer

•   Bi-level optimization:

teacher explainer parameters

\phi_T^\star = \argmax_{\phi_T} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ \mathcal{L}_{\mathrm{sim}}(T(x), S_{\theta^\star(\phi_T)}) \big]

(inner opt.)

(outer opt.)

How can we optimize this?

•    Assume the explainers are differentiable

•    Explicit differentiation with a truncated gradient update

•    Diff. through a gradient operation \(\Leftrightarrow\) JAX for Hessian-vector products

student parameters and student explainer parameters

Differentiable, Parameterized Explainer

•   Head-level parameterization:

Differentiable, Parameterized Explainer

•   Head-level parameterization:

Differentiable, Parameterized Explainer

•   Head-level parameterization:

Differentiable, Parameterized Explainer

\(\in \mathbb{R}^L\)

•   Head-level parameterization:

Differentiable, Parameterized Explainer

•   Head-level parameterization:

Differentiable, Parameterized Explainer

\lambda_T = \mathrm{normalize} (\phi_T) \in \triangle_{H-1}

•   Head-level parameterization:

Differentiable, Parameterized Explainer

\mathrm{sparsemax}(z) = \argmin_{p\in \triangle_{H-1}}\|p - z\|_2
\lambda_T = \mathrm{normalize} (\phi_T) \in \triangle_{H-1}

From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification. Martins and Astudillo, 2016. (ICML)

•   Head-level parameterization:

Contributions

\mathrm{sparsemax}(z) = \argmin_{p\in \triangle_{H-1}}\|p - z\|_2
\lambda_T = \mathrm{normalize} (\phi_T) \in \triangle_{H-1}

From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification. Martins and Astudillo, 2016. (ICML)

•   Head-level parameterization:

Experiments: simulability

•   Text classification (IMDB)

•   Image classification (CIFAR-100)

•   Machine Translation Quality Estimation (MLQE-PE)

Experiments: simulability

•   Text classification (IMDB)

•   Image classification (CIFAR-100)

•   Machine Translation Quality Estimation (MLQE-PE)

Experiments: simulability

•   Text classification (IMDB)

•   Image classification (CIFAR-100)

•   Machine Translation Quality Estimation (MLQE-PE)

Experiments: simulability

•   Text classification (IMDB)

•   Image classification (CIFAR-100)

•   Machine Translation Quality Estimation (MLQE-PE)

Experiments: simulability

•   Text classification (IMDB)

•   Image classification (CIFAR-100)

•   Machine Translation Quality Estimation (MLQE-PE)

Experiments: simulability

•   Text classification (IMDB)

•   Image classification (CIFAR-100)

•   Machine Translation Quality Estimation (MLQE-PE)

Experiments: simulability

•   Text classification (IMDB)

•   Image classification (CIFAR-100)

•   Machine Translation Quality Estimation (MLQE-PE)

Experiments: simulability

•   Text classification (IMDB)

•   Image classification (CIFAR-100)

•   Machine Translation Quality Estimation (MLQE-PE)

Experiments: simulability

•   Text classification (IMDB)

•   Image classification (CIFAR-100)

•   Machine Translation Quality Estimation (MLQE-PE)

Experiments: simulability

•   Text classification (IMDB)

•   Image classification (CIFAR-100)

•   Machine Translation Quality Estimation (MLQE-PE)

Experiments: plausibility

•   Plausiblity (human-likeness) of the explainers

Text Classification

Image Classification

Quality Estimation

Experiments: plausibility

•   Plausiblity (human-likeness) of the explainers

Text Classification

Image Classification

Quality Estimation

Experiments: plausibility

•   Plausiblity (human-likeness) of the explainers

Text Classification

Image Classification

Quality Estimation

Experiments: plausibility

•   Plausiblity (human-likeness) of the explainers

Text Classification

Image Classification

Quality Estimation

"television"

"butterfly"

Experiments: head projection

\lambda_T = \mathrm{normalize} (\phi_T) \in \triangle_{H-1}

•   Normalization functions

Experiments: head projection

\lambda_T = \mathrm{normalize} (\phi_T) \in \triangle_{H-1}

•   Normalization functions

Without

Normalization

Softmax

Entmax

Sparsemax

.90

.85

.80

.75

.70

Simulability accuracy

Experiments: head projection

\lambda_T = \mathrm{normalize} (\phi_T) \in \triangle_{H-1}

•   Normalization functions

•    Only a small subset of attention heads are deemed relevant by SMaT

Without

Normalization

Softmax

Entmax

Sparsemax

.90

.85

.80

.75

.70

Simulability accuracy

CIFAR-100

Experiments: head projection

CIFAR-100

Experiments: head projection

Conclusions

•   SMaT is a framework that optimizes explanations for teaching students

        -    SMaT leads to high simulability
        -    SMaT learns plausible explanations

 

•   We hope this work motivates the interpretability community to consider scaffolding as valuable criterion for evaluating and designing new methods

Introduction

•   Simulability is particularly appealing for evaluating explanations

        ✓  aligns with the goal of communicating the underlying model behavior

          is easily measurable (both manually and automatically)

          puts all explainability methods under a single perspective

Introduction

•   Simulability is particularly appealing for evaluating explanations

        ✓  aligns with the goal of communicating the underlying model behavior

          is easily measurable (both manually and automatically)

          puts all explainability methods under a single perspective

 

•   Pruthi et al. (2021) proposed a framework for measuring simulability that

Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)

Introduction

•   Simulability is particularly appealing for evaluating explanations

        ✓  aligns with the goal of communicating the underlying model behavior

          is easily measurable (both manually and automatically)

          puts all explainability methods under a single perspective

 

•   Pruthi et al. (2021) proposed a framework for measuring simulability that

        ⭐️ disregards trivial protocols

     

 

Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)

Introduction

•   Simulability is particularly appealing for evaluating explanations

        ✓  aligns with the goal of communicating the underlying model behavior

          is easily measurable (both manually and automatically)

          puts all explainability methods under a single perspective

 

•   Pruthi et al. (2021) proposed a framework for measuring simulability that

        ⭐️ disregards trivial protocols

     

 

punctuation symbols  ⟹  positive
stop words           ⟹  negative

Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)

Introduction

•   Simulability is particularly appealing for evaluating explanations

        ✓  aligns with the goal of communicating the underlying model behavior

          is easily measurable (both manually and automatically)

          puts all explainability methods under a single perspective

 

•   Pruthi et al. (2021) proposed a framework for measuring simulability that

        ⭐️ disregards trivial protocols

        🧶 requires an optimization procedure

 

Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)

punctuation symbols  ⟹  positive
stop words           ⟹  negative

Learning to Scaffold - NeurIPS

By mtreviso

Learning to Scaffold - NeurIPS

  • 168