report is made by
Pavel Temirchev
Two types of uncertainty:
Author's proposition:
Given a properly trained Bayesian Neural Network,
one can decompose its uncertainty into aleatoric and epistemic terms
(Depeweg et al., 2018)
Decomposition of uncertainty in Bayesian Deep Learning
probabilistic model
posterior distribution over model parameters (not tractable)
We want to approximate the posterior with a tractable distribution
The common optimization routine is following (Variational Bayes):
Authors propose to minimize different metric - \(\alpha\)-divergence, since it yields better results
(Hernandez-Lobato et al., 2016) Black-Box \(\alpha\)-Divergence Minimization
Commonly, the model is chosen to have the following form:
The approximate posterior is chosen to be fully-factorized Gaussian:
and the prior on parameters has the similar form:
where \(\lambda\) is a prior variance and is commonly chosen to be 1
Classical BNNs assume only additive Gaussian noise, which is restrictive
Idea: feed the noise into the network as an input,
thinking of it as of a latent variable \(z\)
The model became:
Approximate posterior:
Prior:
(Depeweg et al., 2017) Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks
The variance of the predictive model is fixed!
Once we trained a proper BNN, we are interested in decomposing its uncertainty into aleatoric and epistemic components.
Total Uncertainty:
Aleatoric Uncertainty:
Epistemic Uncertainty:
The predictive distribution of a BNN commonly has no closed form
So, we should estimate its entropy from samples.
Assume is a set of samples from the distribution of interest
and the set is sorted
Then we can approximate the entropy with the Nearest Neighbor Estimator:
where is the digamma function
(Kraskov et al., 2003) Estimating Mutual Information
Dataset
i | 0 | 1 | 2 |
---|---|---|---|
1/3 | 1/3 | 1/3 | |
-4 | 0 | 4 | |
2/5 | 0.9 | 2/5 |
Training of the neural networks
BNN
BNN + Latent Variable
Classical BNN produces satisfactory results, whereas BNN+LV is much worse
Both NNs were implemented in PyTorch from scratch with Variational Bayes approach
Uncertainty decomposition in BNN without (!) latent variables
Total and aleatoric uncertainty captures the region with the biggest STD of the model
Epistemic one is too noisy (and unstable from realization to realization)
Uncertainty decomposition in BNN with Latent Variable
The distribution of uncertainties is not informative!
This is due to the poor model performance. Probably learnable additive noise can help.
(Depeweg et al., 2018)
Decomposition of Uncertainty in Bayesian Deep Learning
for Efficient and Risk-sensitive Learning
The results from the paper show that the maximums of the epistemic uncertainty match the most unobserved regions of \(\mathcal{X}\)
The results are quite hard to reproduce
For classical BNN without (!) latent variables
Total and epistemic uncertainty grow quickly outside of domain.
Plot for epistemic one shows that in-domain variation of uncertainty is comparably low.
For active learning purposes
Common technique for active learning is to maximize the Epistemic Uncertainty over \(x\)
However, as we've seen previously, the EU is not a good maximization objective:
One possible way to overcome these problems is to tract the data generation procedure
as a sampling from distribution task.
The samples should be close to the in-domain data
And the model should be uncertain about them (in the epistemic sense)
For active learning purposes
Assume, we have a probability distribution of in-domain data points:
generative distribution
We can use the fact that for constructing the of the following distribution
And we can sample from using
Metropolis - Hastings algorithm
proposal distribution:
acceptance ratio: