Deep Generative AI

florpi

HerWILL 2/8/24

https://florpi.github.io/

IAIFI Fellow

Carol Cuesta-Lazaro

A 2D animation of a folk music band composed of anthropomorphic autumn leaves, each playing traditional bluegrass instruments, amidst a rustic forest setting dappled with the soft light of a harvest moon

BEFORE

Artificial General Intelligence?

AFTER

p(x)
p(y|x)
p(x|y) = \frac{p(y|x)p(x)}{p(y)}
p(x|y)

https://vitalflux.com/generative-vs-discriminative-models-examples/

Generation vs Discrimination

Maximize the likelihood of the training samples

\hat \theta = \argmax \left[ \log p_\theta (x_\mathrm{train}) \right]
x_1
x_2

Model

p_\theta(x)

Training Samples

x_\mathrm{train}

Generative Models 101

The curse of dimensionality

p_\theta(x) = \frac{e^{f_\theta(x)}}{Z_\theta}
Z_\theta = \int e^{f_\theta(x)}dx
x_1
x_2

Trained Model

p_\theta(x)

Generate Novel Samples

Evaluate probabilities

Anomaly detection, model comparison...

https://parti.research.google​​​​​​​

A portrait photo of  a kangaroo wearing an orange hoodie and blue sunglasses standing on the grass in front of the Sydney Opera House holding a sign on the chest that says Welcome Friends!

Scaling laws and emergent abilities

"Scaling Laws for Neural Language Models" Kaplan et al

Explicit Density

Implicit Density

Tractable Density

Approximate Density

Normalising flows

Variational Autoencoders 

Diffusion models

Generative Adversarial Networks

The zoo of generative models

Base distribution

Target distribution

p_X(x) = p_Z(z) \left| \frac{dz}{dx}\right|
Z \sim \mathcal{N} (0,1) \rightarrow g(z) \rightarrow X

Invertible transformation

z \sim p_Z(z)
p_Z(z)

Normalizing flows

x = f(z), \, z = f^{-1}(x)
p(\mathbf{x}) = p_z(f^{-1}(\mathbf{x})) \left\vert \det J(f^{-1}) \right\vert

(Image Credit: Phillip Lippe)

p(x) = \int dz \, p(x|z)

z: Latent variables

Invertible functions aren't that common!

Splines

arXiv:1911.01429

Simulation-based inference

But ODE solutions are always invertible!

z = x + \int_0^1 \phi (x(t)) dt
x = z + \int_1^0 \phi (x(t)) dt
\log p_X(x) = \log p_Z(z) + \int_0^1 \mathrm{Tr} J_\phi (x(t)) dt

Issues NFs: Lack of flexibility

  • Invertible functions
  • Tractable Jacobians

 

Chen et al. (2018), Grathwohl et al. (2018)
z = x + \int_0^1 \phi (x(t)) dt
\log p_X(x) = \log p_Z(z) + \int_0^1 \mathrm{Tr} J_\phi (x(t)) dt

Reverse diffusion: Denoise previous step

Forward diffusion: Add Gaussian noise (fixed)

q_\theta(z_0|z_1)
p(z_1|z_0)
p(z_0)
p(z_T)
p(z_2)
p(z_1)
p(z_2|z_1)
p(z_T|z_2)
q_\theta(z_1|z_2)
q_\theta(z_2|z_T)

Diffusion generative models

s_\theta(x,t) = \nabla_x \log p_t(x)

Score

"Equivariant Diffusion for Molecule Generation in 3D" Hongeboom et al

 

Speeding up drug discovery

A person half Yoda half Gandalf

Desired molecule properties

Students at MIT are

Large Language Models

Pre-trained on next word prediction

...

OVER-CAFFEINATED

NERDS

SMART

ATHLETIC

https://www.astralcodexten.com/p/janus-simulators

How do we encode "helpful" in the loss function?

BEFORE RLHF

AFTER RLHF

RLHF: Reinforcement Learning from Human Feedback

Step 1

Human teaches desired output

Explain RLHF

After training the model...

Step 2

Human scores outputs

+ teaches Reward model to score

it is the method by which ...

Explain means to tell someone...

Explain RLHF

Step 3

Tune the Language Model to produce high rewards!

"Sparks of Artificial General Intelligence: Early experiments with GPT-4" Bubeck et al

Produce Javascript code that creates a random graphical image that looks like a painting of Kandinsky

Draw a unicorn in TikZ

ImageBind: Multimodality

"ImageBind: One Embedding Space To Bind Them All" Girdhar et al

 

cuestalz@mit.edu

HerWill - Summer School 2024

By carol cuesta

HerWill - Summer School 2024

  • 166