Bayesian Exploration

(for Petroleum Industry)

Speaker: Pavel Temirchev

4th year Ph.D. student

Reminder: supervised ML

"Cat"

"Dog"

"Giraffe"

Object

Target variable

\(x\)

\(y\)

you need a dataset of examples:
\(\mathcal{D}=\{x_i, y_i\}_i \)

Motivation

Assume, you have an ML model (e.g., model that mimics tNavigator):

\hat y = f_\theta(x)

And you collect your dataset as following:

Either you generate objects \(x\) and compute targets \(y\) by software or by hand
Or, you already have the dataset \(\mathcal{D}=\{x_i, y_i\}_i\), but it is possible to extend it by software or by hand

||y - \hat y|| \rightarrow \min_\theta

An example

NN works poorly in the zone of unseen BHPs
We need more data there
How to find this zone without plotting it?

Theoretical reminder

some useful formulas

The entropy:

\mathcal{H}(p) = - \int p(x) \log p(x) dx

(the measure of uncertainty)

The Kulback-Leibler divergence:

KL(p||q) = \int p(x) \log \frac{p(x)}{q(x)} dx

(kind of distance between two random variables)

(not a metric!)

The Bayes' rule:

p(\theta|x) = \frac{p(x|\theta) p(\theta)}{p(x)}

Can we measure an
uncertainty of the model?

How to measure it?
What if the modelled process is uncertain itself?

(high uncertainty not necessarily signify unobserved data)

(any ideas on how to embed uncertainty into the model?)

Probability distributions
parametrised by a NN

One way: train a NN to predict parameters of a distribution. E.g.:

p(y|x) = \mathcal{N}\big (y| \mu_\theta(x), \sigma_\theta(x) \big)

\big ( \mu_\theta(x), \sigma_\theta(x) \big)

- a NN with two outputs

Use the maximum likelihood approach to train it

Back to the example

with stochastic Neural Networks

The uncertainty does not signify the importance of the point
Since it is obtained from overfitting

An example of a stochastic process

Is the petroleum engineering deterministic?

no certainty about the correlation
errors in the data collection workflow
now the uncertainty is useless

Bayesian Neural Networks

can help us

Now the parameters \(\theta\) are also random variables.
We need to find the posterior given the dataset and a prior:

p(\theta|\mathcal{D}) = \frac{p(\mathcal{D}|\theta) p(\theta)}{p(\mathcal{D})}

Now the parameters \(\theta\) are also random variables.
We need to find the posterior given the dataset and a prior:

And the prediction of the BNN:

p(y|x) = \int p_\theta(y|x)p(\theta|\mathcal{D})d\theta

\(p(\mathcal{D}|\theta)\) - easy

\(p(\theta)\) - is given

\(p(\mathcal{D}) = \int p(\mathcal{D}|\theta) p(\theta) d\theta \) - very hard

p(y|x) \approx \frac{1}{n}\sum_i^n p_{\theta_i}(y|x)

\(\theta_i \in n \) trained SNN

it is just an ensemble of NNs

The decomposition of uncertainty

The total uncertainty is the entropy of the BNN's predictive distribution:

\mathcal{H} \big[p(y|x)\big]

\approx \mathcal{H} \big[ \frac{1}{n}\sum_i^n p_{\theta_i}(y|x)\big]

The aleatoric uncertainty is the average entropy of the ensemble members:

\mathbb{E}_\theta\mathcal{H} \big[p_\theta(y|x)\big]

\approx \frac{1}{n}\sum_i^n \mathcal{H} \big[p_{\theta_i}(y|x)\big]

The epistemic uncertainty is the difference between them:

EU = TU - AU

It shows the difference between ensemble members

The decomposition of uncertainty

Some results from the article

https://arxiv.org/pdf/1710.07283.pdf

Conclusions

We have proposed a possible solution to the dataset acquisition problem for the ROM training:

the approach is based on Bayesian exploration
we state that the examples with high value of epistemic uncertainty should be added to the dataset
given an NN-based ROM, the proposed method does not require major changes in the model architecture - it is enough to train a set of NNs on a bootstrapped data instead of a single one

Bayesian Exploration

Reminder: supervised ML

Motivation

An example

Theoretical reminder

Can we measure an uncertainty of the model?

Probability distributions parametrised by a NN

Back to the example

An example of a stochastic process

Bayesian Neural Networks

The decomposition of uncertainty

The decomposition of uncertainty

Some results from the article

Conclusions

Can we measure an
uncertainty of the model?

Probability distributions
parametrised by a NN