["Genie 2: A large-scale foundation model" Parker-Holder et al (2024)]
["Generative AI for designing and validating easily synthesizable and structurally novel antibiotics" Swanson et al]
Probabilistic ML has made high dimensional inference tractable
1024x1024xTime
["Genie 3: A new frontier for world models" Parker-Holder et al (2025)]
https://parti.research.google
A portrait photo of a kangaroo wearing an orange hoodie and blue sunglasses standing on the grass in front of the Sydney Opera House holding a sign on the chest that says Welcome Friends!
Data
A PDF that we can optimize
Maximize the likelihood of the data
Maximize the likelihood of the training samples
Parametric Model
Training Samples
Trained Model
Evaluate probabilities
Low Probability
High Probability
Generate Novel Samples
Simulator
Generative Model
Fast emulators
Testing Theories
Generative Model
Simulator
GANS
VAEs
Normalizing
Flows
Diffusion Models
[Image Credit: https://lilianweng.github.io/posts/2018-10-13-flow-models/]
Base
Data
How is the bridge constrained?
Normalizing flows: Reverse = Forward inverse
Diffusion: Forward = Gaussian noising
Flow Matching: Forward = Interpolant
is p(x0) restricted?
Diffusion: p(x0) is Gaussian
Normalising flows: p(x0) can be evaluated
Is bridge stochastic (SDE) or deterministic (ODE)?
Diffusion: Stochastic (SDE)
Normalising flows: Deterministic (ODE)
(Exact likelihood evaluation)
"Creating noise from data is easy;
creating data from noise is generative modeling."
Yang Song
How is
distributed?
Transformation (flow):
Normalizing flows in 1934
Different people learn differently
Take the data generator and start from scratch
Fill in the gaps
Start from a working solution and play around
Base distribution
Target distribution
Invertible transformation
[Image Credit: "Understanding Deep Learning" Simon J.D. Prince]
Bijective
Sample
Evaluate probabilities
Probability mass conserved locally
Image Credit: "Understanding Deep Learning" Simon J.D. Prince
Splines
Issues NFs: Lack of flexibility
Neural Network
Sample
Evaluate probabilities
Forward Model
Observable
Dark matter
Dark energy
Inflation
Predict
Infer
Parameters
Inverse mapping
Normalizing flow
Continuity Equation
[Image Credit: "Understanding Deep Learning" Simon J.D. Prince]
Chen et al. (2018), Grathwohl et al. (2018)
Generate
Evaluate Probability
Gaussian
Two
Moons
Loss requires solving an ODE!
Diffusion, Flow matching, Interpolants... All ways to avoid this at training time
Assume a conditional vector field (known at training time)
The loss that we can compute
The gradients of the losses are the same!
["Flow Matching for Generative Modeling" Lipman et al]
["Stochastic Interpolants: A Unifying framework for Flows and Diffusions" Albergo et al]
Intractable
Continuity equation
[Image Credit: "Understanding Deep Learning" Simon J.D. Prince]
Sample
Evaluate probabilities
Gaussian
Two
Moons
Reverse diffusion: Denoise previous step
Forward diffusion: Add Gaussian noise (fixed)
Prompt
A person half Yoda half Gandalf
Denoising = Regression
Fixed base distribution:
Gaussian
["A point cloud approach to generative modeling for galaxy surveys at the field level"
Cuesta-Lazaro and Mishra-Sharma
International Conference on Machine Learning ICML AI4Astro 2023, Spotlight talk, arXiv:2311.17141]
Base Distribution
Target Distribution
Simulated Galaxy 3d Map
Prompt:
Prompt: A person half Yoda half Gandalf
Real or Fake?
["A Practical Guide to Sample-based Statistical Distances for Evaluating Generative Models in Science" Bischoff et al 2024
arXiv:2403.12636]
Mean relative velocity
k Nearest neighbours
Pair separation
Pair separation
Varying cosmological parameters
Physics as a testing ground: Well-understood summary statistics enable rigorous validation of generative models
["Generalization in diffusion models arises from geometry-adaptive harmonic representations" Kadkhodaie et al (2024)]
Split training set into non-overlapping
1) Compress into low dimensional latent
2) Use generative model as decoder
Encoder
["CosmoFlow: Scale-Aware Representation Learning for Cosmology with Flow Matching" Kannan, Qiu, Cuesta-Lazaro, Jeong (in prep)]
["CosmoFlow: Scale-Aware Representation Learning for Cosmology with Flow Matching" Kannan, Qiu, Cuesta-Lazaro, Jeong (in prep)]
["CosmoFlow: Scale-Aware Representation Learning for Cosmology with Flow Matching" Kannan, Qiu, Cuesta-Lazaro, Jeong (in prep)]
EHT posterior samples with different priors
["Event-horizon-scale Imaging of M87* under Different Assumptions via Deep Generative Image Priors" Feng et al]
CIFAR-10
GRMHD
RIAF
CelebA
(Sims)
(Sims)
(LR Natural Images)
(Human Faces)
Prior
["Learning Diffusion Priors from Observations by Expectation Maximization" Rozet et al]
Base
Data
How is the bridge constrained?
Normalizing flows: Reverse = Forward inverse
Diffusion: Forward = Gaussian noising
Flow Matching: Forward = Interpolant
is p(x0) restricted?
Diffusion: p(x0) is Gaussian
Normalising flows: p(x0) can be evaluated
Is bridge stochastic (SDE) or deterministic (ODE)?
Diffusion: Stochastic (SDE)
Normalising flows: Deterministic (ODE)
(Exact likelihood evaluation)
Gaussian
Two
Moons
Books by Kevin P. Murphy
Machine learning, a probabilistic perspective
Probabilistic Machine Learning: advanced topics
ML4Astro workshop https://ml4astro.github.io/icml2023/
ProbAI summer school https://github.com/probabilisticai/probai-2023
IAIFI Summer school
Blogposts
carolina.clzr@gmail.com