Importance Weighted Hierarchical Variational Inference

Artem Sobolev and Dmitry Vetrov

Variational Inference

  • The Evidence Lower Bound (ELBO): $$ \log p(x) \ge \mathbb{E}_{q(z|x)} \log \frac{p(x, z)}{q(z|x)} $$
  • The gap: $$ \text{ELBO} = \log p(x) - D_{KL}(q(z|x) \mid\mid p(z|x)) $$
    • More expressive \(q(z|x)\) ⇒ more complicated \(p(z|x)\)
  • We need from \(q(z|x)\):
    • Sample for Monte Carlo estimation
    • Evaluate \(\log q(z|x)\) on these samples

Neural Samplers

  • Let \(q(z|x) = \int q(z|\psi, x) q(\psi) d\psi\) where
    • \(q(z|\psi, x)\) is generated using a neural network taking \(\psi\) and \(x\) as inputs
    • \(q(\psi)\) is some simple distribution, say, \(\mathcal{N}(0, I)\)
    • Very similar to VAE's generative model
  • Marginal likelihood \(q(z|x)\) is now intractable
    • Need to lower bound \(- \log q(z|x)\)
      • Need upper bound on \(\log q(z|x)\)
      • The standard lower bound won't help

Upper Bounds

  • Hierarchical Variational Models (HVM, Ranganath et al. 2016): $$ \log q(z|x) \le \mathbb{E}_{\color{red} q(\psi|x,z)} \log \frac{\color{red} q(z, \psi|x)}{\tau(\psi|x,z)} $$
    • \(\tau(\psi|x,z)\) is auxiliary variational distribution
    • Similar to ELBO: $$ \log q(z|x) \ge \mathbb{E}_{\color{blue} \tau(\psi|x,z)} \log \frac{q(z, \psi|x)}{\color{blue} \tau(\psi|x,z)} $$
  • Semi-Implicit Variational Inference (SIVI, Yin and Zhou 2018): $$ \log q(z|x) \le \mathbb{E}_{q(\psi_0|x,z)} \mathbb{E}_{q(\psi_{1:K}|x)} \log \left[ \frac{1}{K+1} \sum_{k=0}^K q(z|\psi_k, x) \right] $$

Importance Weighted Hierarchical VI

$$ \boxed{ \log q(z|x) \le \mathbb{E}_{q(\psi_0|x,z)} \mathbb{E}_{\tau(\psi_{1:K}|x)} \log \left[ \frac{1}{K+1} \sum_{k=0}^K \frac{q(z, \psi_k \mid x)}{\tau(\psi_k|x,z)} \right] }$$

  • Generalizes both SIVI and HVM
  • Upper-bound analogue of the IWAE lower bound: $$ \log q(z|x) \ge\mathbb{E}_{\tau(\psi_{1:K}|x)} \log \left[ \frac{1}{K} \sum_{k=1}^K \frac{q(z, \psi_k \mid x)}{\tau(\psi_k|x,z)} \right] $$
  • Has similar theoretical guarantees:
    • Always an upper bound
    • Monotonically improves as \(K\) increases
    • Exact in the limit of infinite \(K\)

And more

  • In the paper:
    • IWHVI ⇒ better inference models \(q(z|x)\) ⇒ better generative models \(p(x, z)\)
    • Signal-to-noise ratios or are tighter bounds better
    • Multisample Variational Sandwich bounds on the Mutual Information

↓ See links in the description ↓

Importance Weighted Hierarchical Variational Inference Teaser

By Artëm Sobolev

Importance Weighted Hierarchical Variational Inference Teaser

  • 1,029