Importance Weighted Hierarchical Variational Inference

Artem Sobolev and Dmitry Vetrov

Variational Inference

The Evidence Lower Bound (ELBO): $$ \log p(x) \ge \mathbb{E}_{q(z|x)} \log \frac{p(x, z)}{q(z|x)} $$
The gap: $$ \text{ELBO} = \log p(x) - D_{KL}(q(z|x) \mid\mid p(z|x)) $$
- More expressive $q(z|x)$ ⇒ more complicated $p(z|x)$
We need from $q(z|x)$:
- Sample for Monte Carlo estimation
- Evaluate $\log q(z|x)$ on these samples

Neural Samplers

Let $q(z|x) = \int q(z|\psi, x) q(\psi) d\psi$ where
- $q(z|\psi, x)$ is generated using a neural network taking $\psi$ and $x$ as inputs
- $q(\psi)$ is some simple distribution, say, $\mathcal{N}(0, I)$
- Very similar to VAE's generative model
Marginal likelihood $q(z|x)$ is now intractable
- Need to lower bound $- \log q(z|x)$
  - Need upper bound on $\log q(z|x)$
  - The standard lower bound won't help

Upper Bounds

Hierarchical Variational Models (HVM, Ranganath et al. 2016): $$ \log q(z|x) \le \mathbb{E}_{\color{red} q(\psi|x,z)} \log \frac{\color{red} q(z, \psi|x)}{\tau(\psi|x,z)} $$
- $\tau(\psi|x,z)$ is auxiliary variational distribution
- Similar to ELBO: $$ \log q(z|x) \ge \mathbb{E}_{\color{blue} \tau(\psi|x,z)} \log \frac{q(z, \psi|x)}{\color{blue} \tau(\psi|x,z)} $$
Semi-Implicit Variational Inference (SIVI, Yin and Zhou 2018): $$ \log q(z|x) \le \mathbb{E}_{q(\psi_0|x,z)} \mathbb{E}_{q(\psi_{1:K}|x)} \log \left[ \frac{1}{K+1} \sum_{k=0}^K q(z|\psi_k, x) \right] $$

Importance Weighted Hierarchical VI

$$ \boxed{ \log q(z|x) \le \mathbb{E}_{q(\psi_0|x,z)} \mathbb{E}_{\tau(\psi_{1:K}|x)} \log \left[ \frac{1}{K+1} \sum_{k=0}^K \frac{q(z, \psi_k \mid x)}{\tau(\psi_k|x,z)} \right] }$$

Generalizes both SIVI and HVM
Upper-bound analogue of the IWAE lower bound: $$ \log q(z|x) \ge\mathbb{E}_{\tau(\psi_{1:K}|x)} \log \left[ \frac{1}{K} \sum_{k=1}^K \frac{q(z, \psi_k \mid x)}{\tau(\psi_k|x,z)} \right] $$
Has similar theoretical guarantees:
- Always an upper bound
- Monotonically improves as $K$ increases
- Exact in the limit of infinite $K$

And more

In the paper:
- IWHVI ⇒ better inference models $q(z|x)$ ⇒ better generative models $p(x, z)$
- Signal-to-noise ratios or are tighter bounds better
- Multisample Variational Sandwich bounds on the Mutual Information

↓ See links in the description ↓

Importance Weighted Hierarchical Variational Inference Teaser

By Artëm Sobolev

Importance Weighted Hierarchical Variational Inference Teaser

5 years ago
1,256

Artëm Sobolev

Research Scientist in Machine Learning

Importance Weighted Hierarchical Variational Inference

Variational Inference

Neural Samplers

Upper Bounds

Importance Weighted Hierarchical VI

And more

Importance Weighted Hierarchical Variational Inference Teaser

More from Artëm Sobolev