Srijith Rajamohan, Ph.D.
Srijith Rajamohan, Ph.D.
Srijith Rajamohan, Ph.D.
Text
Picture courtesy of Wikipedia
Srijith Rajamohan, Ph.D.
The input data
The latent representation
Conditional probability distribution of the input
Probability of the latent space variable
Conditional probability of the latent space variable given the input
Srijith Rajamohan, Ph.D.
Srijith Rajamohan, Ph.D.
Using Bayes theorem to compute P(z|X) is intractable since the computation of P(x) is usually not feasible
However, we can compute P(z|X) using Variational Inference
In most cases, Q(z|X) is a Normal distribution and we try to minimize the difference between these two distributions using the KL Divergence
Srijith Rajamohan, Ph.D.
Encoder output
We don't have P(z|X) so we use Bayes Theorem to replace it as
Srijith Rajamohan, Ph.D.
These two combine to give another KL Divergence
This is the objective function for the VAE
Srijith Rajamohan, Ph.D.
The right hand side of the equation is called the Evidence Lower Bound
Srijith Rajamohan, Ph.D.
Evidence
The term we want to minimize
Reconstruction error
KL Divergence between the approximate function and the distribution of the latent variable
Srijith Rajamohan, Ph.D.
Decoder loss
Encoder loss
Srijith Rajamohan, Ph.D.
Decoder loss
Encoder loss
Srijith Rajamohan, Ph.D.
Example architecture of a Variational Autoencoder.
Image taken from Jeremy Jordan's excellent blog on the topic https://www.jeremyjordan.me/variational-autoencoders/
Srijith Rajamohan, Ph.D.
Decoder loss
Encoder loss
Srijith Rajamohan, Ph.D.
Decoder loss
Encoder loss
Srijith Rajamohan, Ph.D.
Decoder loss
Encoder loss
Dimensionality of the vectors is a hyperparameter
Srijith Rajamohan, Ph.D.
We want our distributions to be broad so that they can cover the solution space, otherwise it would suffer from the same problem as a regular auto encoder, i.e. discontinuous solution space
Srijith Rajamohan, Ph.D.
Picture from Jeremy Jordans blog
Srijith Rajamohan, Ph.D.
Sampling
Srijith Rajamohan, Ph.D.
Srijith Rajamohan, Ph.D.
To generate data, we sample z from a unit normal since our assumption of P(z) is N(0,1)
Also, we made Q(z|X) similar to P(z)