Micro-Learning Task

Uncertainty decomposition in Bayesian Neural Networks

Pedagogy of Higher Education Course (Summer, 2020)

 Presented by:

Pavel Temirchev

Instructor:

Magnus Gustafsson

Teaching Assistants:

Dina Bek

Aysylu Askarova

Yash Madhwal

Why I've chosen this topic?

I've listened to an online MSc degree course on Uncertainty Quantification.
And it was quite unclear on the Uncertainty Decomposition part.
The author used formal definitions to explain the phenomenon:

\text{Total Uncertainty}(x) = \mathcal{H} \big[ \mathbb{E}_{p(\mathcal{W}|\mathcal{D})}\, p(y|x, \mathcal{W}) \big]

\text{Aleatoric Uncertainty}(x) = \mathbb{E}_{p(\mathcal{W}|\mathcal{D})}\, \mathcal{H} \big[ \,p(y|x, \mathcal{W}) \big]

\text{Epistemic Uncertainty}(x) = \text{Aleatoric Uncertainty}(x) - \text{Total Uncertainty}(x)

Aleatoric Uncertainty is due to the irreducible randomness of the modeled process.
Epistemic Uncertainty is caused by the lack of data during training.

This phenomenon can be expressed clearer and in a more involving manner!

Plan of the lecture:

Reminder: Neural Networks
Bayesian Neural Networks - how they work?
Uncertainty decomposition: epistemic and aleatoric uncertainty
How to use this knowledge in practice?

Reminder: Neural Networks

Neural Network:

\text{oracle}(x|\mathcal{W})

x \text{ (e.g. image)}

\text{probability distribution on } y

DOG

CAT

FROG

GIRAFFE

FISH

What are the most probable and the second most probable class labels in this example?

NN's parameters - "weights"

Bayesian Neural Networks (aka Bayesian Ensemble)

By definition:

p(y|x) = \mathbb{E}_{p(\mathcal{W}|\mathcal{D})}\, \text{oracle}(x|\mathcal{W})

p(y|x) = \mathbb{E}_{p(\mathcal{W}|\mathcal{D})}\, \text{oracle}(x|\mathcal{W}) \approx \frac{1}{N} \sum_{i=0}^N \,\text{oracle}(x|\mathcal{W}_i)

An average over an (infinite) number of oracles!

NN - single oracle:

BNN - ensemble of oracles:

Bayesian Ensembling

DOG

CAT

FROG

DOG

CAT

FROG

DOG

CAT

FROG

Oracle's predictions

\text{oracle}(x|\mathcal{W}_1)

\text{oracle}(x|\mathcal{W}_2)

\text{oracle}(x|\mathcal{W}_3)

DOG

CAT

FROG

Ensemble prediction

NOTE: How to measure the uncertainty of a prediction?

- low uncertainty

- high uncertainty

Please, help me!
Is the prediction uncertainty high or low?

DOG

CAT

FROG

And for this one?

DOG

CAT

FROG

Entropy \(\mathcal{H}\) is a measure of uncertainty (assume, you can compute it):

\mathcal{H} \big(\text{oracle}(x|\mathcal{W})\big) = \text{high} \rightarrow \texttt{Uncertainty is high}

\mathcal{H} \big(\text{oracle}(x|\mathcal{W})\big) = \text{low} \rightarrow \texttt{Uncertainty is low}

Certainty about uncertainty

DOG

CAT

FROG

\text{oracle}(x|\mathcal{W}_3)

DOG

CAT

FROG

\text{oracle}(x|\mathcal{W}_2)

DOG

CAT

FROG

\text{oracle}(x|\mathcal{W}_1)

DOG

CAT

FROG

Ensemble prediction

high

Averaged oracle's uncertainty:

\text{Aleatoric Uncertainty}(x) = \frac{1}{N} \sum_{i=0}^N \, \mathcal{H} \big[ \,\text{oracle}(x|\mathcal{W}_i) \big]

Ensemble uncertainty:

\text{Total Uncertainty}(x) = \mathcal{H} \big[ \, \frac{1}{N} \sum_{i=0}^N\,\text{oracle}(x|\mathcal{W}_i) \big]

Uncertainty about uncertainty

DOG

CAT

FROG

Ensemble prediction

\text{oracle}(x|\mathcal{W}_3)

DOG

CAT

FROG

\text{oracle}(x|\mathcal{W}_2)

DOG

CAT

FROG

\text{oracle}(x|\mathcal{W}_1)

DOG

CAT

FROG

\text{Aleatoric Uncertainty}(x)

\text{Total Uncertainty}(x)

\text{Epistemic Uncertainty}(x) = \text{Total Uncertainty}(x) - \text{Aleatoric Uncertainty}(x)

How to use Aleatoric Uncertainty?

How to use Epistemic Uncertainty?

NN oracles are forced to minimize their aleatoric uncertainty during training.
If aleatoric uncertainty is high, then the certain answer is rather impossible (process is stochastic)
Humans are not trained as neural nets. Humans commonly have high aleatoric uncertainty just because.

Commonly, high epistemic uncertainty is caused by a completely new input X.
Oracles had not seen anything similar before, so they make different predictions.
You can add epistemically uncertain input to the training dataset in order to improve your ensemble!

Uncertainty Decomposition: Example #1

Did the birds evolve from dinosaurs?

Uncertainty Decomposition: Example #2

How tall Napoleon was?

Uncertainty Decomposition: Example #3

What's the weather will be today at 10:00?

Uncertainty Decomposition: Comparison

What is the difference between BIRDS and (WEATHER, NAPOLEON) experiments?

What you can say about it?

Please, help me with your feedback: https://forms.gle/ri455uJWzqyeAWAZ7

Thank you for your attention!

And learn more about birds and dinosaurs!

Micro-Learning Temirchev Pavel

By cydoroga

Micro-Learning Temirchev Pavel

Micro-Learning Task

Uncertainty decomposition in Bayesian Neural Networks

Pedagogy of Higher Education Course (Summer, 2020)

Pavel Temirchev

Magnus Gustafsson

Dina Bek

Aysylu Askarova

Yash Madhwal

Why I've chosen this topic?

This phenomenon can be expressed clearer and in a more involving manner!

Plan of the lecture:

Reminder: Neural Networks

Bayesian Neural Networks - how they work?

Uncertainty decomposition: epistemic and aleatoric uncertainty

How to use this knowledge in practice?

Reminder: Neural Networks

Neural Network:

What are the most probable and the second most probable class labels in this example?

Bayesian Neural Networks (aka Bayesian Ensemble)

An average over an (infinite) number of oracles!

NN - single oracle:

BNN - ensemble of oracles:

Bayesian Ensembling

NOTE: How to measure the uncertainty of a prediction?

Please, help me! Is the prediction uncertainty high or low?

And for this one?

Entropy \(\mathcal{H}\) is a measure of uncertainty (assume, you can compute it):

Certainty about uncertainty

Uncertainty about uncertainty

How to use Aleatoric Uncertainty?

How to use Epistemic Uncertainty?

Uncertainty Decomposition: Example #1

Uncertainty Decomposition: Example #2

Uncertainty Decomposition: Example #3

Uncertainty Decomposition: Comparison

What is the difference between BIRDS and (WEATHER, NAPOLEON) experiments? What you can say about it?

Thank you for your attention!

And learn more about birds and dinosaurs!

Micro-Learning Temirchev Pavel

More from cydoroga

Please, help me!
Is the prediction uncertainty high or low?

What is the difference between BIRDS and (WEATHER, NAPOLEON) experiments?

What you can say about it?