DOI: 10.1214/21-EJS1924
(Joint work with Xiaotong Shen and Wei Pan)
The Chinese University of Hong Kong
User/Item | Item 1 | Item 2 | Item 3 | Item 4 |
---|---|---|---|---|
User 1 | ❓ | 👍 | ❓ | ❓ |
User 2 | 👎🏻 | 👍 | 👍 | ❓ |
User 3 | ❓ | 👍 | 👎🏻 | 👎🏻 |
User 4 | 👎🏻 | ❓ | 👍 | ❓ |
User 5 | ❓ | ❓ | ❓ | ❓ |
Example: binary Last.fm dataset
Side info: User/item feats
Feedback (FB)
user feats: age, gender, country, ...
Side info: User/item feats
Feedback (FB)
Notation: binary binary recsystems
user-i feats: cont + cate
item-j feats: cont + cate
feedback
all feats info for user-i and item j
Side info: User/item feats
Feedback (FB)
Notation: binary binary recsystems
user-i feats: cont + cate
item-j feats: cont + cate
feedback
all feats info for user-i and item j
Side info: User/item feats
Feedback (FB)
Evaluation: misclassification error
decision function
Side info: User/item feats
Feedback (FB)
Example: binary Last.fm dataset
Side info: User/item feats
FB1
FB2
FB3
Example: DeskDrop dataset
Example: DeskDrop dataset
View(100%)
like(9.4%)
follow(2.3%)
user feats: agent, country, ...
View = False
like = False
Follow= False
Example: DeskDrop dataset
View(100%)
like(9.4%)
follow(2.3%)
user feats: agent, country, ...
View = False
like = False
Follow= False
Srijith Rajamohan, Ph.D.
Srijith Rajamohan, Ph.D.
Srijith Rajamohan, Ph.D.
Text
Picture courtesy of Wikipedia
Srijith Rajamohan, Ph.D.
The input data
The latent representation
Conditional probability distribution of the input
Probability of the latent space variable
Conditional probability of the latent space variable given the input
Srijith Rajamohan, Ph.D.
Srijith Rajamohan, Ph.D.
Using Bayes theorem to compute P(z|X) is intractable since the computation of P(x) is usually not feasible
However, we can compute P(z|X) using Variational Inference
In most cases, Q(z|X) is a Normal distribution and we try to minimize the difference between these two distributions using the KL Divergence
Srijith Rajamohan, Ph.D.
Encoder output
We don't have P(z|X) so we use Bayes Theorem to replace it as
Srijith Rajamohan, Ph.D.
These two combine to give another KL Divergence
This is the objective function for the VAE
Srijith Rajamohan, Ph.D.
Theorem 1 (Bayes rule for classification). A classifier f(x) is a global minimizer of (1) if and ony lf
The right hand side of the equation is called the Evidence Lower Bound
Srijith Rajamohan, Ph.D.
Evidence
The term we want to minimize
Reconstruction error
KL Divergence between the approximate function and the distribution of the latent variable
Srijith Rajamohan, Ph.D.
Decoder loss
Encoder loss
Srijith Rajamohan, Ph.D.
Decoder loss
Encoder loss
Srijith Rajamohan, Ph.D.
Example architecture of a Variational Autoencoder.
Image taken from Jeremy Jordan's excellent blog on the topic https://www.jeremyjordan.me/variational-autoencoders/
Srijith Rajamohan, Ph.D.
Decoder loss
Encoder loss
Srijith Rajamohan, Ph.D.
Decoder loss
Encoder loss
Srijith Rajamohan, Ph.D.
Decoder loss
Encoder loss
Dimensionality of the vectors is a hyperparameter
Srijith Rajamohan, Ph.D.
We want our distributions to be broad so that they can cover the solution space, otherwise it would suffer from the same problem as a regular auto encoder, i.e. discontinuous solution space
Srijith Rajamohan, Ph.D.
Picture from Jeremy Jordans blog
Srijith Rajamohan, Ph.D.
Sampling
Srijith Rajamohan, Ph.D.
Srijith Rajamohan, Ph.D.
To generate data, we sample z from a unit normal since our assumption of P(z) is N(0,1)
Also, we made Q(z|X) similar to P(z)