Variance Reduction and Convergence Analysis of BBVI

UBC CPSC 532F - Apri 2021

Mohamad Amin Mohamadi

Variational Inference

Using PyTorch's AutoGrad to calculate gradient of the ELBO
Using PyTorch's Adam Optimizer to optimize the latent variables of the proposals
Using PyTorch's distributions as proposals for posterior distribution

Does the control variate actually help reducing the variance of the gradient?
Is there any significant difference in terms of convergence between the variance reduced method and the normal method?

Gaussian(1, sqrt(5))

Gaussian( x , sqrt(2))

7, 8

Model Negative Joint Log Likelihood Score Variance Reduced

Model Negative Joint Log Likelihood Score Normal

	True Values	Estimated
Mean	7.25	7.19
STD	0.91	0.88

Gaussian(0, 10)

Model Negative Joint Log Likelihood Score Variance Reduced

Model Negative Joint Log Likelihood Score Normal

Slope

Bias

Model Negative Joint Log Likelihood Score Variance Reduced

As mentioned, the control variates are a promising way to reduce the variance of the score function, but in large models, like the HMM test case, it can not help save the model from extremely small likelihoods.

Although the BBVI technique outperforms the classic sampling methods in small-moderate models and thanks to AutoGrad, it's automatic, it fails to converge to the posterior whereas classic methods still work.