Learning to Scaffold:
Optimizing Model Explanations for Teaching
November 28th
NeurIPS 2022
Marcos Treviso*
Patrick Fernandes*
Danish Pruthi
André Martins
Graham Neubig
• Explainability methods generally do not correlate with each other
• Most explanations do not help to predict the model’s outputs and/or failures
Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)
• Explainability methods generally do not correlate with each other
• Most explanations do not help to predict the model’s outputs and/or failures
• Simulability: "can we recover the model’s output based on the explanation?"
✓ aligns with the goal of communicating the underlying model behavior
✓ is easily measurable (both manually and automatically)
✓ puts all explainability methods under a single perspective
Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)
• Explainability methods generally do not correlate with each other
• Most explanations do not help to predict the model’s outputs and/or failures
• Simulability: "can we recover the model’s output based on the explanation?"
✓ aligns with the goal of communicating the underlying model behavior
✓ is easily measurable (both manually and automatically)
✓ puts all explainability methods under a single perspective
• Pruthi et al. (2021) proposed a framework for measuring simulability that disregards trivial protocols 🥰
Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)
(training time)
(training time)
teacher
student
(training time)
teacher
student
cross entropy
(training time)
(test time)
teacher
student
cross entropy
(training time)
(test time)
teacher
student
agreement
cross entropy
Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)
(training time)
(test time)
teacher
student
cross entropy
agreement
Introducing explanations: Teacher and Student explainers \(E_T(x)\), \(E_S(x)\)
Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)
(training time)
(test time)
teacher
student
cross entropy
agreement
Introducing explanations: Teacher and Student explainers \(E_T(x)\), \(E_S(x)\)
simulability loss
Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)
(training time)
(test time)
teacher
student
cross entropy
agreement
Introducing explanations: Teacher and Student explainers \(E_T(x)\), \(E_S(x)\)
simulability loss
explainer regularizer (e.g.. KL)
Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)
(training time)
(test time)
teacher
student
cross entropy
simulability loss
agreement
Introducing explanations: Teacher and Student explainers \(E_T(x)\), \(E_S(x)\)
explainer regularizer (e.g.. KL)
(scaffolded simulability)
(standard simulability)
Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)
(training time)
(test time)
teacher
student
cross entropy
simulability loss
agreement
Introducing explanations: Teacher and Student explainers \(E_T(x)\), \(E_S(x)\)
explainer regularizer (e.g.. KL)
(scaffolded simulability)
(standard simulability)
(training time)
(test time)
teacher
student
cross entropy
(scaffolded simulability)
(standard simulability)
simulability loss
agreement
Introducing explanations: Teacher and Student explainers \(E_T(x)\), \(E_S(x)\)
explainer regularizer (e.g.. KL)
Can we learn explainers \(\phi(E)\) that optimize simulability?
(scaffolded simulability)
(optim. scaffolded simulability)
• Scaffold-Maximizing Training (SMaT) framework
• Scaffold-Maximizing Training (SMaT) framework
• Scaffold-Maximizing Training (SMaT) framework
parameterized explainers
simulability loss
• Scaffold-Maximizing Training (SMaT) framework
parameterized explainers
simulability loss
student parameters and student explainer parameters
• Bi-level optimization:
(inner opt.)
• Scaffold-Maximizing Training (SMaT) framework
parameterized explainers
simulability loss
• Bi-level optimization:
teacher explainer parameters
(inner opt.)
(outer opt.)
student parameters and student explainer parameters
• Scaffold-Maximizing Training (SMaT) framework
parameterized explainers
simulability loss
regularizer
• Bi-level optimization:
teacher explainer parameters
(inner opt.)
(outer opt.)
How can we optimize this?
student parameters and student explainer parameters
• Scaffold-Maximizing Training (SMaT) framework
parameterized explainers
simulability loss
regularizer
• Bi-level optimization:
teacher explainer parameters
(inner opt.)
(outer opt.)
How can we optimize this?
• Assume the explainers are differentiable
student parameters and student explainer parameters
• Scaffold-Maximizing Training (SMaT) framework
parameterized explainers
simulability loss
regularizer
• Bi-level optimization:
teacher explainer parameters
(inner opt.)
(outer opt.)
How can we optimize this?
• Assume the explainers are differentiable
• Explicit differentiation with a truncated gradient update
How can we optimize this?
• Assume the explainers are differentiable
• Explicit differentiation with a truncated gradient update
student parameters and student explainer parameters
• Scaffold-Maximizing Training (SMaT) framework
parameterized explainers
simulability loss
regularizer
• Bi-level optimization:
teacher explainer parameters
(inner opt.)
(outer opt.)
How can we optimize this?
• Assume the explainers are differentiable
• Explicit differentiation with a truncated gradient update
• Diff. through a gradient operation \(\Leftrightarrow\) JAX for Hessian-vector products
student parameters and student explainer parameters
• Head-level parameterization:
• Head-level parameterization:
• Head-level parameterization:
\(\in \mathbb{R}^L\)
• Head-level parameterization:
• Head-level parameterization:
• Head-level parameterization:
From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification. Martins and Astudillo, 2016. (ICML)
• Head-level parameterization:
From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification. Martins and Astudillo, 2016. (ICML)
• Head-level parameterization:
• Text classification (IMDB)
• Image classification (CIFAR-100)
• Machine Translation Quality Estimation (MLQE-PE)
• Text classification (IMDB)
• Image classification (CIFAR-100)
• Machine Translation Quality Estimation (MLQE-PE)
• Text classification (IMDB)
• Image classification (CIFAR-100)
• Machine Translation Quality Estimation (MLQE-PE)
• Text classification (IMDB)
• Image classification (CIFAR-100)
• Machine Translation Quality Estimation (MLQE-PE)
• Text classification (IMDB)
• Image classification (CIFAR-100)
• Machine Translation Quality Estimation (MLQE-PE)
• Text classification (IMDB)
• Image classification (CIFAR-100)
• Machine Translation Quality Estimation (MLQE-PE)
• Text classification (IMDB)
• Image classification (CIFAR-100)
• Machine Translation Quality Estimation (MLQE-PE)
• Text classification (IMDB)
• Image classification (CIFAR-100)
• Machine Translation Quality Estimation (MLQE-PE)
• Text classification (IMDB)
• Image classification (CIFAR-100)
• Machine Translation Quality Estimation (MLQE-PE)
• Text classification (IMDB)
• Image classification (CIFAR-100)
• Machine Translation Quality Estimation (MLQE-PE)
• Plausiblity (human-likeness) of the explainers
Text Classification
Image Classification
Quality Estimation
• Plausiblity (human-likeness) of the explainers
Text Classification
Image Classification
Quality Estimation
• Plausiblity (human-likeness) of the explainers
Text Classification
Image Classification
Quality Estimation
• Plausiblity (human-likeness) of the explainers
Text Classification
Image Classification
Quality Estimation
"television"
"butterfly"
• Normalization functions
• Normalization functions
Without
Normalization
Softmax
Entmax
Sparsemax
.90
.85
.80
.75
.70
Simulability accuracy
• Normalization functions
• Only a small subset of attention heads are deemed relevant by SMaT
Without
Normalization
Softmax
Entmax
Sparsemax
.90
.85
.80
.75
.70
Simulability accuracy
CIFAR-100
CIFAR-100
• SMaT is a framework that optimizes explanations for teaching students
- SMaT leads to high simulability
- SMaT learns plausible explanations
• We hope this work motivates the interpretability community to consider scaffolding as valuable criterion for evaluating and designing new methods
(paper) arxiv.org/abs/2204.10810
• Simulability is particularly appealing for evaluating explanations
✓ aligns with the goal of communicating the underlying model behavior
✓ is easily measurable (both manually and automatically)
✓ puts all explainability methods under a single perspective
• Simulability is particularly appealing for evaluating explanations
✓ aligns with the goal of communicating the underlying model behavior
✓ is easily measurable (both manually and automatically)
✓ puts all explainability methods under a single perspective
• Pruthi et al. (2021) proposed a framework for measuring simulability that
Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)
• Simulability is particularly appealing for evaluating explanations
✓ aligns with the goal of communicating the underlying model behavior
✓ is easily measurable (both manually and automatically)
✓ puts all explainability methods under a single perspective
• Pruthi et al. (2021) proposed a framework for measuring simulability that
⭐️ disregards trivial protocols
Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)
• Simulability is particularly appealing for evaluating explanations
✓ aligns with the goal of communicating the underlying model behavior
✓ is easily measurable (both manually and automatically)
✓ puts all explainability methods under a single perspective
• Pruthi et al. (2021) proposed a framework for measuring simulability that
⭐️ disregards trivial protocols
punctuation symbols ⟹ positive stop words ⟹ negative
Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)
• Simulability is particularly appealing for evaluating explanations
✓ aligns with the goal of communicating the underlying model behavior
✓ is easily measurable (both manually and automatically)
✓ puts all explainability methods under a single perspective
• Pruthi et al. (2021) proposed a framework for measuring simulability that
⭐️ disregards trivial protocols
🧶 requires an optimization procedure
Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)
punctuation symbols ⟹ positive stop words ⟹ negative