From Zero to Generative
IAIFI Fellow, MIT
Carolina Cuesta-Lazaro
Art: "The art of painting" by Johannes Vermeer
Learning Generative Modelling from scratch
Cuenca
Spain
Heidelberg
Germany
Tokyo
Japan
Durham
England
Boston
US
About Myself
Medical Imaging
Epidemiology: Agent Based simulations
OBSERVED
SIMULATED
Cosmology
Simulations
HPC
Science question
Statistics ML
Natural Language
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
BEFORE
Artificial General Intelligence?
AFTER
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
https://parti.research.google
A portrait photo of a kangaroo wearing an orange hoodie and blue sunglasses standing on the grass in front of the Sydney Opera House holding a sign on the chest that says Welcome Friends!
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Scaling laws and emergent abilities
"Scaling Laws for Neural Language Models" Kaplan et al
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
"Sparks of Artificial General Intelligence: Early experiments with GPT-4" Bubeck et al
Produce Javascript code that creates a random graphical image that looks like a painting of Kandinsky
Draw a unicorn in TikZ
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Today's Plan
1. Recap of the Machine Learning building blocks
2. Learning to classify
BREAK
3. Tutorial: Build your first classifier
4. Introduction to Generative Models
5. Tutorial: Build your first generative model
(if time permits)
BREAK
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
The building blocks: 1. Data
Cosmic Cartography
(Pointclouds)
MNIST
(Images)
Wikipedia
(Text)
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
1024x1024
The curse of dimensionality
Inductive biases!
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
The building blocks: 2. Architectures
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Multilayer Perceptron
Image Credit: CS231n Convolutional Neural Networks for Visual Recognition
Pixel 1
Pixel 2
Pixel N
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Convolutional Neural Networks
Inductive bias: Translation Invariance
Data Representation: Images
Image Credit: Irhum Shakfat "Intuitively Understanding Convolutions for Deep Learning"
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Inductive bias: Permutation Invariance
Data Representation: Sets, Pointclouds
Deep sets
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Transformers might be the unifying architecture!
Text
Images
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
The building blocks: 3. Loss function
Image Credit: "Visualizing the loss landscape of neural networks" Hao Li et al
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
The building blocks: 4. The Optimizer
Image Credit: "Complete guide to Adam optimization" Hao Li et al
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Tutorial 1: Learning to classify
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
How do we output a probability?
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Pixel 1
Pixel 2
Pixel N
p Class 1
p Class 2
p Class 10
Loss function: Cross entropy
How different are two probability distributions?
Model Prediction
if True class is for i
otherwise
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Truth: Class = 0
True class
Predicted probability
Loss function: Cross entropy
How different are two probability distributions?
Model Prediction
Truth: Class = 0
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Predicted probability
True class
Loss function: Cross entropy
How different are two probability distributions?
Model Prediction
Truth: Class = 0
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
import flax.linen as nn
class MLP(nn.Module):
@nn.compact
def __call__(self, x):
# Linear
x = nn.Dense(features=64)(x)
# Non-linearity
x = nn.silu(x)
# Linear
x = nn.Dense(features=64)(x)
# Non-linearity
x = nn.silu(x)
# Linear
x = nn.Dense(features=2)(x)
return x
model = MLP()
Jax Models
import jax.numpy as jnp
example_input = jnp.ones((1,4))
params = model.init(jax.random.PRNGKey(0), example_input)
y = model.apply(params, example_input)
Architecture
Parameters
Call
A 2D animation of a folk music band composed of anthropomorphic autumn leaves, each playing traditional bluegrass instruments, amidst a rustic forest setting dappled with the soft light of a harvest moon
Image credit: DALL·E 3
1024x1024
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Generation vs Discrimination
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Data
A PDF that we can optimize
Maximize the likelihood of the data
Generative Models
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Maximize the likelihood of the training samples
Model
Training Samples
Generative Models 101
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Trained Model
Evaluate probabilities
Low Probability
High Probability
Generate Novel Samples
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Change of variables
sampled from a Gaussian distribution with mean 0 and variance 1
How is
distributed?
Base distribution
Target distribution
Invertible transformation
Normalizing flows
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Box-Muller transform
Normalizing flows in 1934
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
(Image Credit: Phillip Lippe)
z: Latent variables
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Invertible functions aren't that common!
Splines
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
But ODE solutions are always invertible!
Continuous time
Issues NFs: Lack of flexibility
- Invertible functions
- Tractable Jacobians
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Flow ODE
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Chen et al. (2018), Grathwohl et al. (2018)
Generate
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Evaluate Probability
Need to solve this expensive integral at each step during training!
Very slow
Can we avoid it?
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Flow matching
Regress the velocity field directly!
But we need to know u. If we know u, then why learn another one?
Image Credit: "An Introduction to flow matchig" Tor Fjelde et al
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Conditional Flow matching
Learn a conditional vector field (known at training time)
Approximate it with an unconditional one
The gradients of the losses are the same!
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Tutorial 2
Gaussian
MNIST
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Students at MIT are
Pre-trained on next word prediction
...
OVER-CAFFEINATED
NERDS
SMART
ATHLETIC
Large Language Models
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
https://www.astralcodexten.com/p/janus-simulators
How do we encode "helpful" in the loss function?
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
Step 1
Human teaches desired output
Explain RLHF
After training the model...
Step 2
Human scores outputs
+ teaches Reward model to score
it is the method by which ...
Explain means to tell someone...
Explain RLHF
Step 3
Tune the Language Model to produce high rewards!
RLHF: Reinforcement Learning from Human Feedback
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
BEFORE RLHF
AFTER RLHF
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
-
Books by Kevin P. Murphy
- Machine learning, a probabilistic perspective
- Probabilistic Machine Learning: advanced topics
- ML4Astro workshop https://ml4astro.github.io/icml2023/
- ProbAI summer school https://github.com/probabilisticai/probai-2023
- IAIFI Summer school
- Blogposts
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
References
cuestalz@mit.edu
Carolina Cuesta-Lazaro IAIFI/MIT - From Zero to Generative
From zero to generative - SummerSchool
By carol cuesta
From zero to generative - SummerSchool
- 263