From this excellent tutorial: https://medium.com/tensorflow/regression-with-probabilistic-layers-in-tensorflow-probability-e46ff5d37baf
Predicting cluster masses from velocity dispersion
Maximum likelihood:
Bayesian Posterior by Variational Inference:
Hinton 2012, Srivastava 2014
Let's express the predictive probability of the model:
Parameterize q(w) in the following way: