Decomposition of Uncertainty in Bayesian Deep Learning
report is made by
Pavel Temirchev
Uncertainty Decomposition
Two types of uncertainty:
-
Aleatoric - caused by the stochasticity of the modelled process
- Epistemic - caused by the lack of training data available
Author's proposition:
Given a properly trained Bayesian Neural Network,
one can decompose its uncertainty into aleatoric and epistemic terms
(Depeweg et al., 2018)
Decomposition of uncertainty in Bayesian Deep Learning
Bayesian Neural Networks
probabilistic model
posterior distribution over model parameters (not tractable)
We want to approximate the posterior with a tractable distribution
The common optimization routine is following (Variational Bayes):
Authors propose to minimize different metric - \(\alpha\)-divergence, since it yields better results
(Hernandez-Lobato et al., 2016) Black-Box \(\alpha\)-Divergence Minimization
Bayesian Neural Networks
Commonly, the model is chosen to have the following form:
The approximate posterior is chosen to be fully-factorized Gaussian:
and the prior on parameters has the similar form:
where \(\lambda\) is a prior variance and is commonly chosen to be 1
Bayesian Neural Networks
with Latent Variables
Classical BNNs assume only additive Gaussian noise, which is restrictive
Idea: feed the noise into the network as an input,
thinking of it as of a latent variable \(z\)
The model became:
Approximate posterior:
Prior:
(Depeweg et al., 2017) Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks
The variance of the predictive model is fixed!
Uncertainty Decomposition in BNNs
Once we trained a proper BNN, we are interested in decomposing its uncertainty into aleatoric and epistemic components.
Total Uncertainty:
Aleatoric Uncertainty:
Epistemic Uncertainty:
Nearest Neighbor Entropy Estimation
The predictive distribution of a BNN commonly has no closed form
So, we should estimate its entropy from samples.
Assume is a set of samples from the distribution of interest
and the set is sorted
Then we can approximate the entropy with the Nearest Neighbor Estimator:
where is the digamma function
(Kraskov et al., 2003) Estimating Mutual Information
Aims of the Final Project
- Implement both BNN and BNN+LV models using PyTorch
- Reproduce the results of neural networks training on the
1d problem with heteroscedastic noise
- Reproduce the results of uncertainty decomposition presented
in the paper
- Analyse the behaviour of epistemic uncertainty on
outside-of-domain data
- Propose a technique for data-generation in the context of active learning
Experimental Results
Dataset
i | 0 | 1 | 2 |
---|---|---|---|
1/3 | 1/3 | 1/3 | |
-4 | 0 | 4 | |
2/5 | 0.9 | 2/5 |
Experimental Results
Training of the neural networks
BNN
BNN + Latent Variable
Classical BNN produces satisfactory results, whereas BNN+LV is much worse
Both NNs were implemented in PyTorch from scratch with Variational Bayes approach
Experimental Results
Uncertainty decomposition in BNN without (!) latent variables
Total and aleatoric uncertainty captures the region with the biggest STD of the model
Epistemic one is too noisy (and unstable from realization to realization)
Experimental Results
Uncertainty decomposition in BNN with Latent Variable
The distribution of uncertainties is not informative!
This is due to the poor model performance. Probably learnable additive noise can help.
Stated results from the paper
(Depeweg et al., 2018)
Decomposition of Uncertainty in Bayesian Deep Learning
for Efficient and Risk-sensitive Learning
The results from the paper show that the maximums of the epistemic uncertainty match the most unobserved regions of \(\mathcal{X}\)
The results are quite hard to reproduce
Uncertainty outside of domain
For classical BNN without (!) latent variables
Total and epistemic uncertainty grow quickly outside of domain.
Plot for epistemic one shows that in-domain variation of uncertainty is comparably low.
Uncertainty-based data generation
For active learning purposes
Common technique for active learning is to maximize the Epistemic Uncertainty over \(x\)
However, as we've seen previously, the EU is not a good maximization objective:
- it is not stable at capturing unobserved regions inside the domain
- it grows rapidly outside of the domain
- it is unbounded from the top (so the optimization problem is ill-posed)
One possible way to overcome these problems is to tract the data generation procedure
as a sampling from distribution task.
The samples should be close to the in-domain data
And the model should be uncertain about them (in the epistemic sense)
Uncertainty-based data generation
For active learning purposes
Assume, we have a probability distribution of in-domain data points:
generative distribution
We can use the fact that for constructing the of the following distribution
And we can sample from using
Metropolis - Hastings algorithm
proposal distribution:
acceptance ratio:
Discussion
- Two types of Bayesian Neural Networks were implemented from scratch using PyTorch.
- The training procedure for BNNs can be classified as unstable.
- It is required to test the \(\alpha\)-divergence minimization procedure instead of the Variational Bayes
- In practice, there is no warranty, that all the epistemic uncertainty will be estimated as the epistemic and not as an aleatoric.
- A better entropy estimation procedure is needed. A probable solution for 1-d is to use K-Nearest Neighbour estimator from (link)
- The method for uncertainty-based exploration was proposed. The method is based on sampling from the generative model, while the epistemic uncertainty acts as a critic. Further investigation is required.
Decomposition of Uncertainty in Bayesian Deep Learning
By cydoroga
Decomposition of Uncertainty in Bayesian Deep Learning
- 520