June 1st
Question: how to formally define an explanation?
A black box model \( b: \mathcal{X}^n \rightarrow \mathcal{Y} \) maps feature space to target space.
A Machine Learning explanation \(E\) is an answer to the question:
Why are (instances of) \(\mathcal{X}_E\) classified as \(y\)?
where \(\mathcal{X}_E \subseteq \mathcal{X}\) and \(y \in \mathcal{Y}\).
Global: \( \mathcal{X}_{E} = \mathcal{X} \)
Local: \( \mathcal{X}_{E} \subset \mathcal{X} \)
Explanation contain statements \(S \in E\) that are either initial conditions \(S_C\) or statistical generalisations \(S_L\) (like IS model)
Initial conditions \(S_C\) are constraints on feature space, constituting \(\mathcal{X}_E\)
"Causality"?
Laws \(S_L\) are constraints where \(P(y)\ |\ \mathcal{X}_L) > P(y\ |\ \mathcal{X}) \) (like SR model)
\(\mathcal{X}_E \subseteq \mathcal{X}_L\)?
\(\mathcal{X}_E \cap \mathcal{X}_L \neq \emptyset\)?
Consider these various assertions about the statement:
X caused Y:
Ice cream consumption correlates with sun burns
Explanation
Prediction
Machine Learning explanations only capture the cause for the prediction, not the real world effect!
white box: only obtain statistical laws from the decisions made by the model.
black box: approach decision boundary such that decisions not made in the model cannot appear in explanation.
We can never observe the event and counterfactual event, hence causality can only be inferred.
With inference comes uncertainty.
However, Machine Learning prediction is deterministic, which ensures we can replay and observe what happens if the statement was not true.
Explanation / Counterexplanation
We could have a very weak explanation and a very strong counterexplanation, but just showing the explanation will not reveil this..
Salmon: maximum homogeneous reference class:
No partitions of the explanation space exist in which \(P(y\ |\ \mathcal{X}_{1})\) is significantly different from \(P(y\ |\ \mathcal{X}_{2})\)
Hempel: explanation is true as far as the knowledge of the explainer goes.