Micro-Learning Task

Uncertainty decomposition in Bayesian Neural Networks

Pedagogy of Higher Education Course (Summer, 2020)

​Presented by:

Pavel Temirchev

Instructor:

Magnus Gustafsson

 

Teaching Assistants:

Dina Bek

Aysylu Askarova

Yash Madhwal

 

 

Why I've chosen this topic?

  • I've listened to an online MSc degree course on Uncertainty Quantification.
  • And it was quite unclear on the Uncertainty Decomposition part.
  • The author used formal definitions to explain the phenomenon:
\text{Total Uncertainty}(x) = \mathcal{H} \big[ \mathbb{E}_{p(\mathcal{W}|\mathcal{D})}\, p(y|x, \mathcal{W}) \big]
\text{Aleatoric Uncertainty}(x) = \mathbb{E}_{p(\mathcal{W}|\mathcal{D})}\, \mathcal{H} \big[ \,p(y|x, \mathcal{W}) \big]
\text{Epistemic Uncertainty}(x) = \text{Aleatoric Uncertainty}(x) - \text{Total Uncertainty}(x)
  • Aleatoric Uncertainty is due to the irreducible randomness of the modeled process.
  • Epistemic Uncertainty is caused by the lack of data during training.

This phenomenon can be expressed clearer and in a more involving manner!

Plan of the lecture:

  • Reminder: Neural Networks

  • Bayesian Neural Networks - how they work?

  • Uncertainty decomposition: epistemic and aleatoric uncertainty

  • How to use this knowledge in practice?

Reminder: Neural Networks

Neural Network:

\text{oracle}(x|\mathcal{W})
x \text{ (e.g. image)}
\text{probability distribution on } y
DOG
CAT
FROG
GIRAFFE
FISH

What are the most probable and the second most probable class labels in this example?

NN's parameters - "weights"

Bayesian Neural Networks (aka Bayesian Ensemble)

By definition:
p(y|x) = \mathbb{E}_{p(\mathcal{W}|\mathcal{D})}\, \text{oracle}(x|\mathcal{W})
p(y|x) = \mathbb{E}_{p(\mathcal{W}|\mathcal{D})}\, \text{oracle}(x|\mathcal{W}) \approx \frac{1}{N} \sum_{i=0}^N \,\text{oracle}(x|\mathcal{W}_i)

An average over an (infinite) number of oracles!

NN - single oracle:

BNN - ensemble of oracles:

Bayesian Ensembling

x
DOG
CAT
FROG
DOG
CAT
FROG
DOG
CAT
FROG
​Oracle's predictions
\text{oracle}(x|\mathcal{W}_1)
\text{oracle}(x|\mathcal{W}_2)
\text{oracle}(x|\mathcal{W}_3)
DOG
CAT
FROG
​Ensemble prediction

NOTE: How to measure the uncertainty of a prediction?

​- low uncertainty
​- high uncertainty

Please, help me!
Is the prediction uncertainty high or low?

DOG
CAT
FROG

And for this one?

DOG
CAT
FROG

Entropy \(\mathcal{H}\) is a measure of uncertainty (assume, you can compute it):

\mathcal{H} \big(\text{oracle}(x|\mathcal{W})\big) = \text{high} \rightarrow \texttt{Uncertainty is high}
\mathcal{H} \big(\text{oracle}(x|\mathcal{W})\big) = \text{low} \rightarrow \texttt{Uncertainty is low}

Certainty about uncertainty

x
DOG
CAT
FROG
\text{oracle}(x|\mathcal{W}_3)
DOG
CAT
FROG
\text{oracle}(x|\mathcal{W}_2)
DOG
CAT
FROG
\text{oracle}(x|\mathcal{W}_1)
DOG
CAT
FROG
​Ensemble prediction

high

high

​Averaged oracle's uncertainty:
\text{Aleatoric Uncertainty}(x) = \frac{1}{N} \sum_{i=0}^N \, \mathcal{H} \big[ \,\text{oracle}(x|\mathcal{W}_i) \big]
​Ensemble uncertainty:
\text{Total Uncertainty}(x) = \mathcal{H} \big[ \, \frac{1}{N} \sum_{i=0}^N\,\text{oracle}(x|\mathcal{W}_i) \big]

Uncertainty about uncertainty

x
DOG
CAT
FROG
​Ensemble prediction
\text{oracle}(x|\mathcal{W}_3)
DOG
CAT
FROG
\text{oracle}(x|\mathcal{W}_2)
DOG
CAT
FROG
\text{oracle}(x|\mathcal{W}_1)
DOG
CAT
FROG
\text{Aleatoric Uncertainty}(x)
\text{Total Uncertainty}(x)
?
>
\text{Epistemic Uncertainty}(x) = \text{Total Uncertainty}(x) - \text{Aleatoric Uncertainty}(x)

How to use Aleatoric Uncertainty?

How to use Epistemic Uncertainty?

  • NN oracles are forced to minimize their aleatoric uncertainty during training.

  • If aleatoric uncertainty is high, then the certain answer is rather impossible (process is stochastic)

  • Humans are not trained as neural nets. Humans commonly have high aleatoric uncertainty just because.

  • Commonly, high epistemic uncertainty is caused by a completely new input X.

  • Oracles had not seen anything similar before, so they make different predictions.

  • You can add epistemically uncertain input to the training dataset in order to improve your ensemble!

Uncertainty Decomposition: Example #1

​Did the birds evolve from dinosaurs?
 

Uncertainty Decomposition: Example #2

​How tall Napoleon was?

Uncertainty Decomposition: Example #3

What's the weather will be today at 10:00?

Uncertainty Decomposition: Comparison

What is the difference between BIRDS and (WEATHER, NAPOLEON) experiments?

What you can say about it?

Please, help me with your feedback: https://forms.gle/ri455uJWzqyeAWAZ7

Thank you for your attention!

And learn more about birds and dinosaurs!

Micro-Learning Temirchev Pavel

By cydoroga

Micro-Learning Temirchev Pavel

  • 581