How should we evaluate explanations?

• Explainability methods generally do not correlate with each other

• Most explanations do not help to predict the model’s outputs and/or failures

Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)

How should we evaluate explanations?

• Explainability methods generally do not correlate with each other

• Most explanations do not help to predict the model’s outputs and/or failures

• Simulability: "can we recover the model’s output based on the explanation?"

✓ aligns with the goal of communicating the underlying model behavior

✓ is easily measurable (both manually and automatically)

✓ puts all explainability methods under a single perspective

Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)

How should we evaluate explanations?

• Explainability methods generally do not correlate with each other

• Most explanations do not help to predict the model’s outputs and/or failures

• Simulability: "can we recover the model’s output based on the explanation?"

✓ aligns with the goal of communicating the underlying model behavior

✓ is easily measurable (both manually and automatically)

✓ puts all explainability methods under a single perspective

• Pruthi et al. (2021) proposed a framework for measuring simulability that disregards trivial protocols 🥰

Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)

Simulability

(training time)

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

Simulability

(training time)

teacher

student

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

Simulability

(training time)

teacher

student

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

cross entropy

Simulability

(training time)

(test time)

\mathrm{SIM}(T, S_\theta) = \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ 1\{\, T(x) = S_\theta(x) \,\} \big]

\mathrm{SIM}(T, S_\theta) = \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ 1\{\, T(x) = S_\theta(x) \,\} \big]

teacher

student

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

cross entropy

Simulability

(training time)

(test time)

\mathrm{SIM}(T, S_\theta) = \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ 1\{\, T(x) = S_\theta(x) \,\} \big]

\mathrm{SIM}(T, S_\theta) = \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ 1\{\, T(x) = S_\theta(x) \,\} \big]

teacher

student

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

agreement

cross entropy

Simulability

Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)

(training time)

(test time)

\mathrm{SIM}(T, S_\theta) = \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ 1\{\, T(x) = S_\theta(x) \,\} \big]

\mathrm{SIM}(T, S_\theta) = \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ 1\{\, T(x) = S_\theta(x) \,\} \big]

teacher

student

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

cross entropy

agreement

Introducing explanations: Teacher and Student explainers $E_T(x)$ , $E_S(x)$

Simulability

Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)

(training time)

(test time)

\mathrm{SIM}(T, S_\theta) = \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ 1\{\, T(x) = S_\theta(x) \,\} \big]

\mathrm{SIM}(T, S_\theta) = \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ 1\{\, T(x) = S_\theta(x) \,\} \big]

teacher

student

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

\theta_E^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x)) \big]

\theta_E^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x)) \big]

cross entropy

agreement

Introducing explanations: Teacher and Student explainers $E_T(x)$ , $E_S(x)$

simulability loss

Simulability

Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)

(training time)

(test time)

\mathrm{SIM}(T, S_\theta) = \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ 1\{\, T(x) = S_\theta(x) \,\} \big]

\mathrm{SIM}(T, S_\theta) = \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ 1\{\, T(x) = S_\theta(x) \,\} \big]

teacher

student

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

\theta_E^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x)) \big]

\theta_E^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x)) \big]

cross entropy

agreement

Introducing explanations: Teacher and Student explainers $E_T(x)$ , $E_S(x)$

simulability loss

explainer regularizer (e.g.. KL)

Simulability

Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)

(training time)

(test time)

\mathrm{SIM}(T, S_\theta) = \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ 1\{\, T(x) = S_\theta(x) \,\} \big]

\mathrm{SIM}(T, S_\theta) = \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ 1\{\, T(x) = S_\theta(x) \,\} \big]

teacher

student

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

\theta_E^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x)) \big]

\theta_E^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x)) \big]

cross entropy

simulability loss

agreement

Introducing explanations: Teacher and Student explainers $E_T(x)$ , $E_S(x)$

explainer regularizer (e.g.. KL)

\underbrace{\hspace{11cm}}_{}

\underbrace{\hspace{11cm}}_{}

\mathrm{SIM}(T, S_{\theta^\star}) \,<\, \mathrm{SIM}(T, S_{\theta_E^\star})

\mathrm{SIM}(T, S_{\theta^\star}) \,<\, \mathrm{SIM}(T, S_{\theta_E^\star})

(scaffolded simulability)

(standard simulability)

Simulability

Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)

(training time)

(test time)

\mathrm{SIM}(T, S_\theta) = \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ 1\{\, T(x) = S_\theta(x) \,\} \big]

\mathrm{SIM}(T, S_\theta) = \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ 1\{\, T(x) = S_\theta(x) \,\} \big]

teacher

student

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

\theta_E^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x)) \big]

\theta_E^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x)) \big]

cross entropy

simulability loss

agreement

Introducing explanations: Teacher and Student explainers $E_T(x)$ , $E_S(x)$

explainer regularizer (e.g.. KL)

\underbrace{\hspace{11cm}}_{}

\underbrace{\hspace{11cm}}_{}

\mathrm{SIM}(T, S_{\theta^\star}) \,<\, \mathrm{SIM}(T, S_{\theta_E^\star})

\mathrm{SIM}(T, S_{\theta^\star}) \,<\, \mathrm{SIM}(T, S_{\theta_E^\star})

(scaffolded simulability)

(standard simulability)

Simulability

(training time)

(test time)

\mathrm{SIM}(T, S_\theta) = \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ 1\{\, T(x) = S_\theta(x) \,\} \big]

\mathrm{SIM}(T, S_\theta) = \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ 1\{\, T(x) = S_\theta(x) \,\} \big]

teacher

student

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

\theta^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) \big]

\theta_E^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x)) \big]

\theta_E^\star = \argmax_\theta \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x)) \big]

cross entropy

\mathrm{SIM}(T, S_{\theta^\star}) \,<\, \mathrm{SIM}(T, S_{\theta_E^\star})

\mathrm{SIM}(T, S_{\theta^\star}) \,<\, \mathrm{SIM}(T, S_{\theta_E^\star})

(scaffolded simulability)

(standard simulability)

simulability loss

agreement

Introducing explanations: Teacher and Student explainers $E_T(x)$ , $E_S(x)$

explainer regularizer (e.g.. KL)

\underbrace{\hspace{11cm}}_{}

\underbrace{\hspace{11cm}}_{}

Can we learn explainers $\phi(E)$ that optimize simulability?

(scaffolded simulability)

(optim. scaffolded simulability)

\mathrm{SIM}(T, S_{\theta_E^\star}) \,<\, \mathrm{SIM}(T, S_{\theta_{\phi(E)}})

\mathrm{SIM}(T, S_{\theta_E^\star}) \,<\, \mathrm{SIM}(T, S_{\theta_{\phi(E)}})

Optimizing Explainers for Teaching

• Scaffold-Maximizing Training (SMaT) framework

\mathcal{L}_{\mathrm{student}}(x; T, E_T, S_\theta, E_S) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x))

\mathcal{L}_{\mathrm{student}}(x; T, E_T, S_\theta, E_S) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x))

Optimizing Explainers for Teaching

• Scaffold-Maximizing Training (SMaT) framework

\mathcal{L}_{\mathrm{student}}(x; T, E_T, S_\theta, E_S) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x))

\mathcal{L}_{\mathrm{student}}(x; T, E_T, S_\theta, E_S) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x))

Optimizing Explainers for Teaching

• Scaffold-Maximizing Training (SMaT) framework

\mathcal{L}_{\mathrm{student}}(x; T, E_T, S_\theta, E_S) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x))

\mathcal{L}_{\mathrm{student}}(x; T, E_T, S_\theta, E_S) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x))

\mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_{\phi_T}(x) \,,\, E_{\phi_S}(x))

\mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_{\phi_T}(x) \,,\, E_{\phi_S}(x))

parameterized explainers

simulability loss

Optimizing Explainers for Teaching

• Scaffold-Maximizing Training (SMaT) framework

\mathcal{L}_{\mathrm{student}}(x; T, E_T, S_\theta, E_S) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x))

\mathcal{L}_{\mathrm{student}}(x; T, E_T, S_\theta, E_S) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x))

\mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_{\phi_T}(x) \,,\, E_{\phi_S}(x))

\mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_{\phi_T}(x) \,,\, E_{\phi_S}(x))

\theta^\star(\phi_T), \, \phi_S^\star(\phi_T) = \argmax_{\theta, \phi_S} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) \big]

\theta^\star(\phi_T), \, \phi_S^\star(\phi_T) = \argmax_{\theta, \phi_S} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) \big]

parameterized explainers

simulability loss

student parameters and student explainer parameters

• Bi-level optimization:

(inner opt.)

Optimizing Explainers for Teaching

• Scaffold-Maximizing Training (SMaT) framework

\mathcal{L}_{\mathrm{student}}(x; T, E_T, S_\theta, E_S) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x))

\mathcal{L}_{\mathrm{student}}(x; T, E_T, S_\theta, E_S) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x))

\mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_{\phi_T}(x) \,,\, E_{\phi_S}(x))

\mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_{\phi_T}(x) \,,\, E_{\phi_S}(x))

\theta^\star(\phi_T), \, \phi_S^\star(\phi_T) = \argmax_{\theta, \phi_S} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) \big]

\theta^\star(\phi_T), \, \phi_S^\star(\phi_T) = \argmax_{\theta, \phi_S} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) \big]

parameterized explainers

simulability loss

• Bi-level optimization:

teacher explainer parameters

\phi_T^\star = \argmax_{\phi_T} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ \mathcal{L}_{\mathrm{sim}}(T(x), S_{\theta^\star(\phi_T)}) \big]

\phi_T^\star = \argmax_{\phi_T} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ \mathcal{L}_{\mathrm{sim}}(T(x), S_{\theta^\star(\phi_T)}) \big]

(inner opt.)

(outer opt.)

student parameters and student explainer parameters

Optimizing Explainers for Teaching

• Scaffold-Maximizing Training (SMaT) framework

\mathcal{L}_{\mathrm{student}}(x; T, E_T, S_\theta, E_S) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x))

\mathcal{L}_{\mathrm{student}}(x; T, E_T, S_\theta, E_S) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x))

\mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_{\phi_T}(x) \,,\, E_{\phi_S}(x))

\mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_{\phi_T}(x) \,,\, E_{\phi_S}(x))

\theta^\star(\phi_T), \, \phi_S^\star(\phi_T) = \argmax_{\theta, \phi_S} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) \big]

\theta^\star(\phi_T), \, \phi_S^\star(\phi_T) = \argmax_{\theta, \phi_S} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) \big]

parameterized explainers

simulability loss

regularizer

• Bi-level optimization:

teacher explainer parameters

\phi_T^\star = \argmax_{\phi_T} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ \mathcal{L}_{\mathrm{sim}}(T(x), S_{\theta^\star(\phi_T)}) \big]

\phi_T^\star = \argmax_{\phi_T} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ \mathcal{L}_{\mathrm{sim}}(T(x), S_{\theta^\star(\phi_T)}) \big]

(inner opt.)

(outer opt.)

How can we optimize this?

student parameters and student explainer parameters

Optimizing Explainers for Teaching

• Scaffold-Maximizing Training (SMaT) framework

\mathcal{L}_{\mathrm{student}}(x; T, E_T, S_\theta, E_S) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x))

\mathcal{L}_{\mathrm{student}}(x; T, E_T, S_\theta, E_S) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x))

\mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_{\phi_T}(x) \,,\, E_{\phi_S}(x))

\mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_{\phi_T}(x) \,,\, E_{\phi_S}(x))

\theta^\star(\phi_T), \, \phi_S^\star(\phi_T) = \argmax_{\theta, \phi_S} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) \big]

\theta^\star(\phi_T), \, \phi_S^\star(\phi_T) = \argmax_{\theta, \phi_S} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) \big]

parameterized explainers

simulability loss

regularizer

• Bi-level optimization:

teacher explainer parameters

\phi_T^\star = \argmax_{\phi_T} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ \mathcal{L}_{\mathrm{sim}}(T(x), S_{\theta^\star(\phi_T)}) \big]

\phi_T^\star = \argmax_{\phi_T} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ \mathcal{L}_{\mathrm{sim}}(T(x), S_{\theta^\star(\phi_T)}) \big]

(inner opt.)

(outer opt.)

How can we optimize this?

• Assume the explainers are differentiable

student parameters and student explainer parameters

Optimizing Explainers for Teaching

• Scaffold-Maximizing Training (SMaT) framework

\mathcal{L}_{\mathrm{student}}(x; T, E_T, S_\theta, E_S) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x))

\mathcal{L}_{\mathrm{student}}(x; T, E_T, S_\theta, E_S) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x))

\mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_{\phi_T}(x) \,,\, E_{\phi_S}(x))

\mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_{\phi_T}(x) \,,\, E_{\phi_S}(x))

\theta^\star(\phi_T), \, \phi_S^\star(\phi_T) = \argmax_{\theta, \phi_S} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) \big]

\theta^\star(\phi_T), \, \phi_S^\star(\phi_T) = \argmax_{\theta, \phi_S} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) \big]

parameterized explainers

simulability loss

regularizer

• Bi-level optimization:

teacher explainer parameters

\phi_T^\star = \argmax_{\phi_T} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ \mathcal{L}_{\mathrm{sim}}(T(x), S_{\theta^\star(\phi_T)}) \big]

\phi_T^\star = \argmax_{\phi_T} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ \mathcal{L}_{\mathrm{sim}}(T(x), S_{\theta^\star(\phi_T)}) \big]

(inner opt.)

(outer opt.)

How can we optimize this?

• Assume the explainers are differentiable

• Explicit differentiation with a truncated gradient update

How can we optimize this?

• Assume the explainers are differentiable

• Explicit differentiation with a truncated gradient update

student parameters and student explainer parameters

Optimizing Explainers for Teaching

• Scaffold-Maximizing Training (SMaT) framework

\mathcal{L}_{\mathrm{student}}(x; T, E_T, S_\theta, E_S) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x))

\mathcal{L}_{\mathrm{student}}(x; T, E_T, S_\theta, E_S) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_T(x) \,,\, E_{S_\theta}(x))

\mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_{\phi_T}(x) \,,\, E_{\phi_S}(x))

\mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) = \mathcal{L}_{\mathrm{sim}}(\, T(x) \,,\, S_\theta(x) \,) + \beta \mathcal{L}_{\mathrm{expl}}(\, E_{\phi_T}(x) \,,\, E_{\phi_S}(x))

\theta^\star(\phi_T), \, \phi_S^\star(\phi_T) = \argmax_{\theta, \phi_S} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) \big]

\theta^\star(\phi_T), \, \phi_S^\star(\phi_T) = \argmax_{\theta, \phi_S} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{train}}} \big[ \mathcal{L}_{\mathrm{student}}(x; T, E_{\phi_T}, S_\theta, E_{\phi_S}) \big]

parameterized explainers

simulability loss

regularizer

• Bi-level optimization:

teacher explainer parameters

\phi_T^\star = \argmax_{\phi_T} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ \mathcal{L}_{\mathrm{sim}}(T(x), S_{\theta^\star(\phi_T)}) \big]

\phi_T^\star = \argmax_{\phi_T} \mathbb{E}_{x \sim \mathcal{D}_{\mathrm{test}}} \big[ \mathcal{L}_{\mathrm{sim}}(T(x), S_{\theta^\star(\phi_T)}) \big]

(inner opt.)

(outer opt.)

How can we optimize this?

• Assume the explainers are differentiable

• Explicit differentiation with a truncated gradient update

• Diff. through a gradient operation $\Leftrightarrow$ JAX for Hessian-vector products

student parameters and student explainer parameters

Differentiable, Parameterized Explainer

• Head-level parameterization:

Differentiable, Parameterized Explainer

• Head-level parameterization:

Differentiable, Parameterized Explainer

• Head-level parameterization:

Differentiable, Parameterized Explainer

$\in \mathbb{R}^L$

• Head-level parameterization:

Differentiable, Parameterized Explainer

• Head-level parameterization:

Differentiable, Parameterized Explainer

\lambda_T = \mathrm{normalize} (\phi_T) \in \triangle_{H-1}

\lambda_T = \mathrm{normalize} (\phi_T) \in \triangle_{H-1}

• Head-level parameterization:

Differentiable, Parameterized Explainer

\mathrm{sparsemax}(z) = \argmin_{p\in \triangle_{H-1}}\|p - z\|_2

\mathrm{sparsemax}(z) = \argmin_{p\in \triangle_{H-1}}\|p - z\|_2

\lambda_T = \mathrm{normalize} (\phi_T) \in \triangle_{H-1}

\lambda_T = \mathrm{normalize} (\phi_T) \in \triangle_{H-1}

From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification. Martins and Astudillo, 2016. (ICML)

• Head-level parameterization:

Contributions

\mathrm{sparsemax}(z) = \argmin_{p\in \triangle_{H-1}}\|p - z\|_2

\mathrm{sparsemax}(z) = \argmin_{p\in \triangle_{H-1}}\|p - z\|_2

\lambda_T = \mathrm{normalize} (\phi_T) \in \triangle_{H-1}

\lambda_T = \mathrm{normalize} (\phi_T) \in \triangle_{H-1}

From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification. Martins and Astudillo, 2016. (ICML)

• Head-level parameterization:

Experiments: simulability

• Text classification (IMDB)

• Image classification (CIFAR-100)

• Machine Translation Quality Estimation (MLQE-PE)

Experiments: simulability

• Text classification (IMDB)

• Image classification (CIFAR-100)

• Machine Translation Quality Estimation (MLQE-PE)

Experiments: simulability

• Text classification (IMDB)

• Image classification (CIFAR-100)

• Machine Translation Quality Estimation (MLQE-PE)

Experiments: simulability

• Text classification (IMDB)

• Image classification (CIFAR-100)

• Machine Translation Quality Estimation (MLQE-PE)

Experiments: simulability

• Text classification (IMDB)

• Image classification (CIFAR-100)

• Machine Translation Quality Estimation (MLQE-PE)

Experiments: simulability

• Text classification (IMDB)

• Image classification (CIFAR-100)

• Machine Translation Quality Estimation (MLQE-PE)

Experiments: simulability

• Text classification (IMDB)

• Image classification (CIFAR-100)

• Machine Translation Quality Estimation (MLQE-PE)

Experiments: simulability

• Text classification (IMDB)

• Image classification (CIFAR-100)

• Machine Translation Quality Estimation (MLQE-PE)

Experiments: simulability

• Text classification (IMDB)

• Image classification (CIFAR-100)

• Machine Translation Quality Estimation (MLQE-PE)

Experiments: simulability

• Text classification (IMDB)

• Image classification (CIFAR-100)

• Machine Translation Quality Estimation (MLQE-PE)

Experiments: plausibility

• Plausiblity (human-likeness) of the explainers

Text Classification

Image Classification

Quality Estimation

Experiments: plausibility

• Plausiblity (human-likeness) of the explainers

Text Classification

Image Classification

Quality Estimation

Experiments: plausibility

• Plausiblity (human-likeness) of the explainers

Text Classification

Image Classification

Quality Estimation

Experiments: plausibility

• Plausiblity (human-likeness) of the explainers

Text Classification

Image Classification

Quality Estimation

"television"

"butterfly"

Experiments: head projection

\lambda_T = \mathrm{normalize} (\phi_T) \in \triangle_{H-1}

\lambda_T = \mathrm{normalize} (\phi_T) \in \triangle_{H-1}

• Normalization functions

Experiments: head projection

\lambda_T = \mathrm{normalize} (\phi_T) \in \triangle_{H-1}

\lambda_T = \mathrm{normalize} (\phi_T) \in \triangle_{H-1}

• Normalization functions

Without

Normalization

Softmax

Entmax

Sparsemax

.90

.85

.80

.75

.70

Simulability accuracy

Experiments: head projection

\lambda_T = \mathrm{normalize} (\phi_T) \in \triangle_{H-1}

\lambda_T = \mathrm{normalize} (\phi_T) \in \triangle_{H-1}

• Normalization functions

• Only a small subset of attention heads are deemed relevant by SMaT

Without

Normalization

Softmax

Entmax

Sparsemax

.90

.85

.80

.75

.70

Simulability accuracy

CIFAR-100

Experiments: head projection

CIFAR-100

Experiments: head projection

Conclusions

• SMaT is a framework that optimizes explanations for teaching students

- SMaT leads to high simulability
- SMaT learns plausible explanations

• We hope this work motivates the interpretability community to consider scaffolding as valuable criterion for evaluating and designing new methods

(paper) arxiv.org/abs/2204.10810

(code) github.com/CoderPat/learning-scaffold

Introduction

• Simulability is particularly appealing for evaluating explanations

✓ aligns with the goal of communicating the underlying model behavior

✓ is easily measurable (both manually and automatically)

✓ puts all explainability methods under a single perspective

Introduction

• Simulability is particularly appealing for evaluating explanations

✓ aligns with the goal of communicating the underlying model behavior

✓ is easily measurable (both manually and automatically)

✓ puts all explainability methods under a single perspective

• Pruthi et al. (2021) proposed a framework for measuring simulability that

Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)

Introduction

• Simulability is particularly appealing for evaluating explanations

✓ aligns with the goal of communicating the underlying model behavior

✓ is easily measurable (both manually and automatically)

✓ puts all explainability methods under a single perspective

• Pruthi et al. (2021) proposed a framework for measuring simulability that

⭐️ disregards trivial protocols

Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)

Introduction

• Simulability is particularly appealing for evaluating explanations

✓ aligns with the goal of communicating the underlying model behavior

✓ is easily measurable (both manually and automatically)

✓ puts all explainability methods under a single perspective

• Pruthi et al. (2021) proposed a framework for measuring simulability that

⭐️ disregards trivial protocols

punctuation symbols  ⟹  positive
stop words           ⟹  negative

Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)

Introduction

• Simulability is particularly appealing for evaluating explanations

✓ aligns with the goal of communicating the underlying model behavior

✓ is easily measurable (both manually and automatically)

✓ puts all explainability methods under a single perspective

• Pruthi et al. (2021) proposed a framework for measuring simulability that

⭐️ disregards trivial protocols

🧶 requires an optimization procedure

Evaluating Explanations: How much do explanations from the teacher aid students? Pruthi et. al. 2021. (TACL)

punctuation symbols  ⟹  positive
stop words           ⟹  negative