Bayesian Exploration
(for Petroleum Industry)
Speaker: Pavel Temirchev
4th year Ph.D. student
Reminder: supervised ML
"Cat"
"Cat"
"Dog"
"Giraffe"
Object
Target variable
\(x\)
\(y\)
you need a dataset of examples:
\(\mathcal{D}=\{x_i, y_i\}_i \)
Motivation
Assume, you have an ML model (e.g., model that mimics tNavigator):
And you collect your dataset as following:
- Either you generate objects \(x\) and compute targets \(y\) by software or by hand
- Or, you already have the dataset \(\mathcal{D}=\{x_i, y_i\}_i\), but it is possible to extend it by software or by hand
An example
- NN works poorly in the zone of unseen BHPs
- We need more data there
- How to find this zone without plotting it?
Theoretical reminder
some useful formulas
The entropy:
(the measure of uncertainty)
The Kulback-Leibler divergence:
(kind of distance between two random variables)
(not a metric!)
The Bayes' rule:
Can we measure an
uncertainty of the model?
- How to measure it?
- What if the modelled process is uncertain itself?
(high uncertainty not necessarily signify unobserved data)
(any ideas on how to embed uncertainty into the model?)
Probability distributions
parametrised by a NN
One way: train a NN to predict parameters of a distribution. E.g.:
- a NN with two outputs
Use the maximum likelihood approach to train it
Back to the example
with stochastic Neural Networks
- The uncertainty does not signify the importance of the point
- Since it is obtained from overfitting
An example of a stochastic process
Is the petroleum engineering deterministic?
- no certainty about the correlation
- errors in the data collection workflow
- now the uncertainty is useless
Bayesian Neural Networks
can help us
Now the parameters \(\theta\) are also random variables.
We need to find the posterior given the dataset and a prior:
Now the parameters \(\theta\) are also random variables.
We need to find the posterior given the dataset and a prior:
And the prediction of the BNN:
\(p(\mathcal{D}|\theta)\) - easy
\(p(\theta)\) - is given
\(p(\mathcal{D}) = \int p(\mathcal{D}|\theta) p(\theta) d\theta \) - very hard
\(\theta_i \in n \) trained SNN
it is just an ensemble of NNs
The decomposition of uncertainty
The total uncertainty is the entropy of the BNN's predictive distribution:
The aleatoric uncertainty is the average entropy of the ensemble members:
The epistemic uncertainty is the difference between them:
It shows the difference between ensemble members
The decomposition of uncertainty
Some results from the article
Conclusions
We have proposed a possible solution to the dataset acquisition problem for the ROM training:
- the approach is based on Bayesian exploration
- we state that the examples with high value of epistemic uncertainty should be added to the dataset
- given an NN-based ROM, the proposed method does not require major changes in the model architecture - it is enough to train a set of NNs on a bootstrapped data instead of a single one
Bayesian Exploration for Petroleum Industry
By cydoroga
Bayesian Exploration for Petroleum Industry
- 491