Don't just sample optimize

Peadar Coyle

Bayesian Neural Networks - Thomas Wiecki - PyMC3 Docs

Challenges in Bayesian Inference

1. Tradeoffs. How do we formalize statistical and computational tradeoffs for inference?
2. Software. How do we design efficient and flexible software for generative models?

Why do we need Variational Inference?

Inferring hidden variables
Unlike MCMC:
- Deterministic
- Easy to gauge convergence
- Requires dozens of iterations
Doesn't require conjugacy
Slightly hairier math

Background

\mathbf{x}

\mathbf{x}

p(\mathbf{x}, \mathbf{z})

p(\mathbf{x}, \mathbf{z})

Given

Data set

Generative model

with latent variable

Goal

Infer posterior

\mathbf{z}

\mathbf{z}

\in

\in

\mathbb{R}^{d}

\mathbb{R}^{d}

p(\mathbf{z} | \mathbf{x} )

p(\mathbf{z} | \mathbf{x} )

That is the key problem in Bayesian inference

Let's look at the posterior

We can write the conditional or posterior distribution as
The denominator in the marginal distribution is called the marginal distribution of observations (also called the evidence) and it is calculated by marginalizing out the latent variables from the joint distribution
Often this integral is intractable

p(\mathbf{z} | \mathbf{x}) = \dfrac{p(\mathbf{z},\mathbf{x})}{p(\mathbf{x})}

p(\mathbf{z} | \mathbf{x}) = \dfrac{p(\mathbf{z},\mathbf{x})}{p(\mathbf{x})}

p(\mathbf{x}) = \int_{z} p(\mathbf{z}, \mathbf{x}) d\mathbf{z}

p(\mathbf{x}) = \int_{z} p(\mathbf{z}, \mathbf{x}) d\mathbf{z}

Title Text

Text

What do we approximate?

We create a variational distribution over the latent variables
We want to find settings of
So that q is close to p
When p == q this is plain Expectation Maximization

\nu

\nu

q(z_{1:m} | \nu)

q(z_{1:m} | \nu)

What does closeness mean?

We measure the closeness of distributions using Kullback-Leibler Divergence
If q and p are high we're happy
If KL = 0 , then the distributions are equal
If q is low we don't care. If q isn't high but p isn't we pay a price
http://bit.ly/2oROYAw

\mathbb{E}_{q} [\log \dfrac{q(Z)}{p(Z|x)}]

\mathbb{E}_{q} [\log \dfrac{q(Z)}{p(Z|x)}]

We can do some math...

Negative of ELBO (evidence lower bound) + a constant is equal to KL divergence

-(\mathbb{E}_{q} [\log p(z | x)] - \mathbb{E}_{q}[\log q(z)]) + \log p(x)

-(\mathbb{E}_{q} [\log p(z | x)] - \mathbb{E}_{q}[\log q(z)]) + \log p(x)

Constant

ELBO (in brackets)

Key points

Minimizing KL divergence is the same as maximizing ELBO
This allows us to change a sampling problem into an optimization problem

Whats new in PyMC3

Release of the first stable version in early 2017
Variational Inference
Advanced Hamiltonian Monte Carlo samplers
Easy optimization for finding the MAP point.
Theano support for fast compilation

What else is new

Gaussian process kernels
New variants of Variational Inference (including Operator)
Speed improvements
API and documentation improvements
Bayesian Methods for Hackers - in PyMC3 too

First gather data from some real-world phenomena. Then cycle through Box’s loop:

Build a probabilistic model of the phenomena.
Reason about the phenomena given model and data.
Criticize the model, revise and repeat.

Don't just sample optimize

By springcoil

Don't just sample optimize

2,005

springcoil

springcoil

Don't just sample optimize

Challenges in Bayesian Inference

Why do we need Variational Inference?

Background

Let's look at the posterior

Title Text

What do we approximate?

What does closeness mean?

We can do some math...

Negative of ELBO (evidence lower bound) + a constant is equal to KL divergence

Key points

Whats new in PyMC3

What else is new

Don't just sample optimize

More from springcoil