Learning to Scaffold:
Optimizing Model Explanations for Teaching
November 28th
NeurIPS 2022
Marcos Treviso*
Patrick Fernandes*
Danish Pruthi
André Martins
Graham Neubig
How should we evaluate explanations?
• Explainability methods generally do not correlate with each other
• Most explanations do not help to predict the model’s outputs and/or failures
Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)
How should we evaluate explanations?
• Explainability methods generally do not correlate with each other
• Most explanations do not help to predict the model’s outputs and/or failures
• Simulability: "can we recover the model’s output based on the explanation?"
✓ aligns with the goal of communicating the underlying model behavior
✓ is easily measurable (both manually and automatically)
✓ puts all explainability methods under a single perspective
Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)
How should we evaluate explanations?
• Explainability methods generally do not correlate with each other
• Most explanations do not help to predict the model’s outputs and/or failures
• Simulability: "can we recover the model’s output based on the explanation?"
✓ aligns with the goal of communicating the underlying model behavior
✓ is easily measurable (both manually and automatically)
✓ puts all explainability methods under a single perspective
• Pruthi et al. (2021) proposed a framework for measuring simulability that disregards trivial protocols 🥰
Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)
Simulability
(training time)
Simulability
(training time)
teacher
student
Simulability
(training time)
teacher
student
cross entropy
Simulability
(training time)
(test time)
teacher
student
cross entropy
Simulability
(training time)
(test time)
teacher
student
agreement
cross entropy
Simulability
Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)
(training time)
(test time)
teacher
student
cross entropy
agreement
Introducing explanations: Teacher and Student explainers \(E_T(x)\), \(E_S(x)\)
Simulability
Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)
(training time)
(test time)
teacher
student
cross entropy
agreement
Introducing explanations: Teacher and Student explainers \(E_T(x)\), \(E_S(x)\)
simulability loss
Simulability
Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)
(training time)
(test time)
teacher
student
cross entropy
agreement
Introducing explanations: Teacher and Student explainers \(E_T(x)\), \(E_S(x)\)
simulability loss
explainer regularizer (e.g.. KL)
Simulability
Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)
(training time)
(test time)
teacher
student
cross entropy
simulability loss
agreement
Introducing explanations: Teacher and Student explainers \(E_T(x)\), \(E_S(x)\)
explainer regularizer (e.g.. KL)
(scaffolded simulability)
(standard simulability)
Simulability
Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)
(training time)
(test time)
teacher
student
cross entropy
simulability loss
agreement
Introducing explanations: Teacher and Student explainers \(E_T(x)\), \(E_S(x)\)
explainer regularizer (e.g.. KL)
(scaffolded simulability)
(standard simulability)
Simulability
(training time)
(test time)
teacher
student
cross entropy
(scaffolded simulability)
(standard simulability)
simulability loss
agreement
Introducing explanations: Teacher and Student explainers \(E_T(x)\), \(E_S(x)\)
explainer regularizer (e.g.. KL)
Can we learn explainers \(\phi(E)\) that optimize simulability?
(scaffolded simulability)
(optim. scaffolded simulability)
Optimizing Explainers for Teaching
• Scaffold-Maximizing Training (SMaT) framework
Optimizing Explainers for Teaching
• Scaffold-Maximizing Training (SMaT) framework
Optimizing Explainers for Teaching
• Scaffold-Maximizing Training (SMaT) framework
parameterized explainers
simulability loss
Optimizing Explainers for Teaching
• Scaffold-Maximizing Training (SMaT) framework
parameterized explainers
simulability loss
student parameters and student explainer parameters
• Bi-level optimization:
(inner opt.)
Optimizing Explainers for Teaching
• Scaffold-Maximizing Training (SMaT) framework
parameterized explainers
simulability loss
• Bi-level optimization:
teacher explainer parameters
(inner opt.)
(outer opt.)
student parameters and student explainer parameters
Optimizing Explainers for Teaching
• Scaffold-Maximizing Training (SMaT) framework
parameterized explainers
simulability loss
regularizer
• Bi-level optimization:
teacher explainer parameters
(inner opt.)
(outer opt.)
How can we optimize this?
student parameters and student explainer parameters
Optimizing Explainers for Teaching
• Scaffold-Maximizing Training (SMaT) framework
parameterized explainers
simulability loss
regularizer
• Bi-level optimization:
teacher explainer parameters
(inner opt.)
(outer opt.)
How can we optimize this?
• Assume the explainers are differentiable
student parameters and student explainer parameters
Optimizing Explainers for Teaching
• Scaffold-Maximizing Training (SMaT) framework
parameterized explainers
simulability loss
regularizer
• Bi-level optimization:
teacher explainer parameters
(inner opt.)
(outer opt.)
How can we optimize this?
• Assume the explainers are differentiable
• Explicit differentiation with a truncated gradient update
How can we optimize this?
• Assume the explainers are differentiable
• Explicit differentiation with a truncated gradient update
student parameters and student explainer parameters
Optimizing Explainers for Teaching
• Scaffold-Maximizing Training (SMaT) framework
parameterized explainers
simulability loss
regularizer
• Bi-level optimization:
teacher explainer parameters
(inner opt.)
(outer opt.)
How can we optimize this?
• Assume the explainers are differentiable
• Explicit differentiation with a truncated gradient update
• Diff. through a gradient operation \(\Leftrightarrow\) JAX for Hessian-vector products
student parameters and student explainer parameters
Differentiable, Parameterized Explainer
• Head-level parameterization:
Differentiable, Parameterized Explainer
• Head-level parameterization:
Differentiable, Parameterized Explainer
• Head-level parameterization:
Differentiable, Parameterized Explainer
\(\in \mathbb{R}^L\)
• Head-level parameterization:
Differentiable, Parameterized Explainer
• Head-level parameterization:
Differentiable, Parameterized Explainer
• Head-level parameterization:
Differentiable, Parameterized Explainer
From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification. Martins and Astudillo, 2016. (ICML)
• Head-level parameterization:
Contributions
From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification. Martins and Astudillo, 2016. (ICML)
• Head-level parameterization:
Experiments: simulability
• Text classification (IMDB)
• Image classification (CIFAR-100)
• Machine Translation Quality Estimation (MLQE-PE)
Experiments: simulability
• Text classification (IMDB)
• Image classification (CIFAR-100)
• Machine Translation Quality Estimation (MLQE-PE)
Experiments: simulability
• Text classification (IMDB)
• Image classification (CIFAR-100)
• Machine Translation Quality Estimation (MLQE-PE)
Experiments: simulability
• Text classification (IMDB)
• Image classification (CIFAR-100)
• Machine Translation Quality Estimation (MLQE-PE)
Experiments: simulability
• Text classification (IMDB)
• Image classification (CIFAR-100)
• Machine Translation Quality Estimation (MLQE-PE)
Experiments: simulability
• Text classification (IMDB)
• Image classification (CIFAR-100)
• Machine Translation Quality Estimation (MLQE-PE)
Experiments: simulability
• Text classification (IMDB)
• Image classification (CIFAR-100)
• Machine Translation Quality Estimation (MLQE-PE)
Experiments: simulability
• Text classification (IMDB)
• Image classification (CIFAR-100)
• Machine Translation Quality Estimation (MLQE-PE)
Experiments: simulability
• Text classification (IMDB)
• Image classification (CIFAR-100)
• Machine Translation Quality Estimation (MLQE-PE)
Experiments: simulability
• Text classification (IMDB)
• Image classification (CIFAR-100)
• Machine Translation Quality Estimation (MLQE-PE)
Experiments: plausibility
• Plausiblity (human-likeness) of the explainers
Text Classification
Image Classification
Quality Estimation
Experiments: plausibility
• Plausiblity (human-likeness) of the explainers
Text Classification
Image Classification
Quality Estimation
Experiments: plausibility
• Plausiblity (human-likeness) of the explainers
Text Classification
Image Classification
Quality Estimation
Experiments: plausibility
• Plausiblity (human-likeness) of the explainers
Text Classification
Image Classification
Quality Estimation
"television"
"butterfly"
Experiments: head projection
• Normalization functions
Experiments: head projection
• Normalization functions
Without
Normalization
Softmax
Entmax
Sparsemax
.90
.85
.80
.75
.70
Simulability accuracy
Experiments: head projection
• Normalization functions
• Only a small subset of attention heads are deemed relevant by SMaT
Without
Normalization
Softmax
Entmax
Sparsemax
.90
.85
.80
.75
.70
Simulability accuracy
CIFAR-100
Experiments: head projection
CIFAR-100
Experiments: head projection
Conclusions
• SMaT is a framework that optimizes explanations for teaching students
- SMaT leads to high simulability
- SMaT learns plausible explanations
• We hope this work motivates the interpretability community to consider scaffolding as valuable criterion for evaluating and designing new methods
(paper) arxiv.org/abs/2204.10810
Introduction
• Simulability is particularly appealing for evaluating explanations
✓ aligns with the goal of communicating the underlying model behavior
✓ is easily measurable (both manually and automatically)
✓ puts all explainability methods under a single perspective
Introduction
• Simulability is particularly appealing for evaluating explanations
✓ aligns with the goal of communicating the underlying model behavior
✓ is easily measurable (both manually and automatically)
✓ puts all explainability methods under a single perspective
• Pruthi et al. (2021) proposed a framework for measuring simulability that
Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)
Introduction
• Simulability is particularly appealing for evaluating explanations
✓ aligns with the goal of communicating the underlying model behavior
✓ is easily measurable (both manually and automatically)
✓ puts all explainability methods under a single perspective
• Pruthi et al. (2021) proposed a framework for measuring simulability that
⭐️ disregards trivial protocols
Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)
Introduction
• Simulability is particularly appealing for evaluating explanations
✓ aligns with the goal of communicating the underlying model behavior
✓ is easily measurable (both manually and automatically)
✓ puts all explainability methods under a single perspective
• Pruthi et al. (2021) proposed a framework for measuring simulability that
⭐️ disregards trivial protocols
punctuation symbols ⟹ positive stop words ⟹ negative
Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)
Introduction
• Simulability is particularly appealing for evaluating explanations
✓ aligns with the goal of communicating the underlying model behavior
✓ is easily measurable (both manually and automatically)
✓ puts all explainability methods under a single perspective
• Pruthi et al. (2021) proposed a framework for measuring simulability that
⭐️ disregards trivial protocols
🧶 requires an optimization procedure
Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)
punctuation symbols ⟹ positive stop words ⟹ negative
Learning to Scaffold - NeurIPS
By mtreviso
Learning to Scaffold - NeurIPS
- 180