### Importance Weighted Hierarchical Variational Inference

Artem Sobolev and Dmitry Vetrov

### Variational Inference

• The Evidence Lower Bound (ELBO): $$\log p(x) \ge \mathbb{E}_{q(z|x)} \log \frac{p(x, z)}{q(z|x)}$$
• The gap: $$\text{ELBO} = \log p(x) - D_{KL}(q(z|x) \mid\mid p(z|x))$$
• More expressive $$q(z|x)$$ ⇒ more complicated $$p(z|x)$$
• We need from $$q(z|x)$$:
• Sample for Monte Carlo estimation
• Evaluate $$\log q(z|x)$$ on these samples

### Neural Samplers

• Let $$q(z|x) = \int q(z|\psi, x) q(\psi) d\psi$$ where
• $$q(z|\psi, x)$$ is generated using a neural network taking $$\psi$$ and $$x$$ as inputs
• $$q(\psi)$$ is some simple distribution, say, $$\mathcal{N}(0, I)$$
• Very similar to VAE's generative model
• Marginal likelihood $$q(z|x)$$ is now intractable
• Need to lower bound $$- \log q(z|x)$$
• Need upper bound on $$\log q(z|x)$$
• The standard lower bound won't help

### Upper Bounds

• Hierarchical Variational Models (HVM, Ranganath et al. 2016): $$\log q(z|x) \le \mathbb{E}_{\color{red} q(\psi|x,z)} \log \frac{\color{red} q(z, \psi|x)}{\tau(\psi|x,z)}$$
• $$\tau(\psi|x,z)$$ is auxiliary variational distribution
• Similar to ELBO: $$\log q(z|x) \ge \mathbb{E}_{\color{blue} \tau(\psi|x,z)} \log \frac{q(z, \psi|x)}{\color{blue} \tau(\psi|x,z)}$$
• Semi-Implicit Variational Inference (SIVI, Yin and Zhou 2018): $$\log q(z|x) \le \mathbb{E}_{q(\psi_0|x,z)} \mathbb{E}_{q(\psi_{1:K}|x)} \log \left[ \frac{1}{K+1} \sum_{k=0}^K q(z|\psi_k, x) \right]$$

### Importance Weighted Hierarchical VI

$$\boxed{ \log q(z|x) \le \mathbb{E}_{q(\psi_0|x,z)} \mathbb{E}_{\tau(\psi_{1:K}|x)} \log \left[ \frac{1}{K+1} \sum_{k=0}^K \frac{q(z, \psi_k \mid x)}{\tau(\psi_k|x,z)} \right] }$$

• Generalizes both SIVI and HVM
• Upper-bound analogue of the IWAE lower bound: $$\log q(z|x) \ge\mathbb{E}_{\tau(\psi_{1:K}|x)} \log \left[ \frac{1}{K} \sum_{k=1}^K \frac{q(z, \psi_k \mid x)}{\tau(\psi_k|x,z)} \right]$$
• Has similar theoretical guarantees:
• Always an upper bound
• Monotonically improves as $$K$$ increases
• Exact in the limit of infinite $$K$$

### And more

• In the paper:
• IWHVI ⇒ better inference models $$q(z|x)$$ ⇒ better generative models $$p(x, z)$$
• Signal-to-noise ratios or are tighter bounds better
• Multisample Variational Sandwich bounds on the Mutual Information

↓ See links in the description ↓

By Artëm Sobolev

• 429