Bayesian Exploration

(for Petroleum Industry)

Speaker: Pavel Temirchev

4th year Ph.D. student

Reminder: supervised ML

"Cat"

"Cat"

"Dog"

"Giraffe"

Object
Target variable

\(x\)

\(y\)

you need a dataset of examples:
\(\mathcal{D}=\{x_i, y_i\}_i \)

Motivation

Assume, you have an ML model (e.g., model that mimics tNavigator):

\hat y = f_\theta(x)

And you collect your dataset as following:

  • Either you generate objects \(x\) and compute targets \(y\) by software or by hand
  • Or, you already have the dataset \(\mathcal{D}=\{x_i, y_i\}_i\), but it is possible to extend it by software or by hand
||y - \hat y|| \rightarrow \min_\theta

An example

  • NN works poorly in the zone of unseen BHPs
     
  • We need more data there
     
  • How to find this zone without plotting it?

Theoretical reminder

some useful formulas

The entropy:

\mathcal{H}(p) = - \int p(x) \log p(x) dx

(the measure of uncertainty)

The Kulback-Leibler divergence:

KL(p||q) = \int p(x) \log \frac{p(x)}{q(x)} dx

(kind of distance between two random variables)

(not a metric!)

The Bayes' rule:

p(\theta|x) = \frac{p(x|\theta) p(\theta)}{p(x)}

Can we measure an
uncertainty of the model?

  • How to measure it?
     
  • What if the modelled process is uncertain itself?

(high uncertainty not necessarily signify unobserved data)

(any ideas on how to embed uncertainty into the model?)

Probability distributions
parametrised by a NN

One way: train a NN to predict parameters of a distribution. E.g.:

p(y|x) = \mathcal{N}\big (y| \mu_\theta(x), \sigma_\theta(x) \big)
\big ( \mu_\theta(x), \sigma_\theta(x) \big)

- a NN with two outputs

Use the maximum likelihood approach to train it 

Back to the example

with stochastic Neural Networks

  • The uncertainty does not signify the importance of the point
     
  • Since it is obtained from overfitting

An example of a stochastic process

Is the petroleum engineering deterministic?

  • no certainty about the correlation
     
  • errors in the data collection workflow
     
  • now the uncertainty is useless

Bayesian Neural Networks

can help us

Now the parameters \(\theta\) are also random variables.
We need to find the posterior given the dataset and a prior:

p(\theta|\mathcal{D}) = \frac{p(\mathcal{D}|\theta) p(\theta)}{p(\mathcal{D})}

Now the parameters \(\theta\) are also random variables.
We need to find the posterior given the dataset and a prior:

And the prediction of the BNN:

p(y|x) = \int p_\theta(y|x)p(\theta|\mathcal{D})d\theta

\(p(\mathcal{D}|\theta)\) - easy

\(p(\theta)\) - is given

\(p(\mathcal{D}) = \int p(\mathcal{D}|\theta) p(\theta) d\theta \) - very hard

p(y|x) \approx \frac{1}{n}\sum_i^n p_{\theta_i}(y|x)

\(\theta_i \in n \) trained SNN

it is just an ensemble of NNs

The decomposition of uncertainty

The total uncertainty is the entropy of the BNN's predictive distribution:

\mathcal{H} \big[p(y|x)\big]
\approx \mathcal{H} \big[ \frac{1}{n}\sum_i^n p_{\theta_i}(y|x)\big]

The aleatoric uncertainty is the average entropy of the ensemble members:

\mathbb{E}_\theta\mathcal{H} \big[p_\theta(y|x)\big]
\approx \frac{1}{n}\sum_i^n \mathcal{H} \big[p_{\theta_i}(y|x)\big]

The epistemic uncertainty is the difference between them:

EU = TU - AU

It shows the difference between ensemble members

The decomposition of uncertainty

Some results from the article 

https://arxiv.org/pdf/1710.07283.pdf​

Conclusions

We have proposed a possible solution to the dataset acquisition problem for the ROM training:

 

  • the approach is based on Bayesian exploration
     
  • we state that the examples with high value of epistemic uncertainty should be added to the dataset
     
  • given an NN-based ROM, the proposed method does not require major changes in the model architecture - it is enough to train a set of NNs on a bootstrapped data instead of a single one
Made with Slides.com