Don't just sample optimize
Peadar Coyle
Bayesian Neural Networks  Thomas Wiecki  PyMC3 Docs
Challenges in Bayesian Inference
 1. Tradeoffs. How do we formalize statistical and computational tradeoffs for inference?
 2. Software. How do we design efficient and flexible software for generative models?
Why do we need Variational Inference?
 Inferring hidden variables
 Unlike MCMC:
 Deterministic
 Easy to gauge convergence
 Requires dozens of iterations  Doesn't require conjugacy
 Slightly hairier math
Background
\mathbf{x}
$\mathbf{x}$
p(\mathbf{x}, \mathbf{z})
$p(\mathbf{x}, \mathbf{z})$
Given
 Data set
 Generative model
with latent variable
Goal
 Infer posterior
\mathbf{z}
$\mathbf{z}$
\in
$\in$
\mathbb{R}^{d}
$\mathbb{R}^{d}$
p(\mathbf{z}  \mathbf{x} )
$p(\mathbf{z}  \mathbf{x} )$
That is the key problem in Bayesian inference
Let's look at the posterior
 We can write the conditional or posterior distribution as
 The denominator in the marginal distribution is called the marginal distribution of observations (also called the evidence) and it is calculated by marginalizing out the latent variables from the joint distribution
 Often this integral is intractable
p(\mathbf{z}  \mathbf{x}) = \dfrac{p(\mathbf{z},\mathbf{x})}{p(\mathbf{x})}
$p(\mathbf{z}  \mathbf{x}) = \dfrac{p(\mathbf{z},\mathbf{x})}{p(\mathbf{x})}$
p(\mathbf{x}) = \int_{z} p(\mathbf{z}, \mathbf{x}) d\mathbf{z}
$p(\mathbf{x}) = \int_{z} p(\mathbf{z}, \mathbf{x}) d\mathbf{z}$
Title Text
Text
What do we approximate?
 We create a variational distribution over the latent variables
 We want to find settings of
 So that q is close to p
 When p == q this is plain Expectation Maximization
\nu
$\nu$
q(z_{1:m}  \nu)
$q(z_{1:m}  \nu)$
What does closeness mean?
 We measure the closeness of distributions using KullbackLeibler Divergence
 If q and p are high we're happy
 If KL = 0 , then the distributions are equal
 If q is low we don't care. If q isn't high but p isn't we pay a price
 http://bit.ly/2oROYAw
\mathbb{E}_{q} [\log \dfrac{q(Z)}{p(Zx)}]
$\mathbb{E}_{q} [\log \dfrac{q(Z)}{p(Zx)}]$
We can do some math...
Negative of ELBO (evidence lower bound) + a constant is equal to KL divergence
(\mathbb{E}_{q} [\log p(z  x)]  \mathbb{E}_{q}[\log q(z)]) + \log p(x)
$(\mathbb{E}_{q} [\log p(z  x)]  \mathbb{E}_{q}[\log q(z)]) + \log p(x)$
Constant
ELBO (in brackets)
Key points

Minimizing KL divergence is the same as maximizing ELBO
 This allows us to change a sampling problem into an optimization problem
Whats new in PyMC3
 Release of the first stable version in early 2017
 Variational Inference
 Advanced Hamiltonian Monte Carlo samplers
 Easy optimization for finding the MAP point.
 Theano support for fast compilation
What else is new
 Gaussian process kernels
 New variants of Variational Inference (including Operator)
 Speed improvements
 API and documentation improvements
 Bayesian Methods for Hackers  in PyMC3 too
First gather data from some realworld phenomena. Then cycle through Box’s loop:
 Build a probabilistic model of the phenomena.
 Reason about the phenomena given model and data.
 Criticize the model, revise and repeat.
Don't just sample optimize
By springcoil