Generative Solutions for Cosmic Problems
[Video Credit: N-body simulation Francisco Villaescusa-Navarro]
Carolina Cuesta-Lazaro
IAIFI Fellow, MIT / Center for Astrophysics
1-Dimensional
Machine Learning
Secondary anisotropies
Galaxy formation
Intrinsic alignments
DESI, DESI-II, Spec-S5
Euclid / LSST
Simons Observatory
CMB-S4
Ligo
Einstein
The era of Big Data Cosmology
xAstrophysics
5-Dimensional
Carolina Cuesta-Lazaro IAIFI/MIT @ Princeton 2024
The cost of cosmological simulations
AbacusSummit
330 billion particles in 2 Gpc/h volume
60 trillion particles
~ 8TBs per simulation
15M CPU hours
(TNG50 ~100M cpu hours)
ML Requirements
Carolina Cuesta-Lazaro IAIFI/MIT @ Princeton 2024
Carolina Cuesta-Lazaro IAIFI/MIT @ Princeton 2024
Simulation efficient methods
Can we store a simulation inside a neural network?
Prompting simulators
Fast Emulators + Likelihood Models
Continuous Fields and Compression
Controllable
Simulators
Model
Training Samples
Generative Models 1o1
Evaluate probabilities
Low p(x)
High p(x)
Generate Novel Samples
Carolina Cuesta-Lazaro IAIFI/MIT @ Princeton 2024
Probability mass conserved locally
Inference a la gradient descent
1) Tractable
2) f maximally expressive
Loss = Maximize likelihood training data
Carolina Cuesta-Lazaro IAIFI/MIT @ Princeton 2024
In continuous time
Continuity Equation
Loss requires solving an ODE!
Diffusion, Flow matching, Interpolants... All ways to avoid this at training time
[Image Credit: "Understanding Deep Learning" Simon J.D. Prince]
Carolina Cuesta-Lazaro IAIFI/MIT @ Princeton 2024
Can we regress the velocity field directly?
Turned maximum likelihood into a regression problem!
Interpolant
Stochastic Interpolant
Expectation over all possible paths that go through xt
["Stochastic Interpolants: A Unifying framework for flows and diffusion" Albergo et al arXiv:2303.08797]
Carolina Cuesta-Lazaro IAIFI/MIT @ Princeton 2024
Diffusion Models
Reverse diffusion: Denoise previous step
Forward diffusion: Add Gaussian noise (fixed)
Prompt
A person half Yoda half Gandalf
Denoising = Regression
Fixed base distribution:
Gaussian
Carolina Cuesta-Lazaro IAIFI/MIT @ Princeton 2024
["A point cloud approach to generative modeling for galaxy surveys at the field level"
Cuesta-Lazaro and Mishra-Sharma
ICML AI4Astro 2023, arXiv:2311.17141]
Base Distribution
Target Distribution
- Sample
- Evaluate
Long range correlations
Huge pointclouds (20M)
Homogeneity and isotropy
Siddharth Mishra-Sharma
Carolina Cuesta-Lazaro IAIFI/MIT @ Princeton 2024
Fixed Initial Conditions / Varying Cosmology
Carolina Cuesta-Lazaro IAIFI/MIT @ Princeton 2024
Diffusion model
Carolina Cuesta-Lazaro IAIFI/MIT @ Princeton 2024
CNN
Diffusion
Increasing Noise
["Diffusion-HMC: Parameter Inference with Diffusion Model driven Hamiltonian Monte Carlo" Mudur, Cuesta-Lazaro and Finkbeiner NeurIPs 2023 ML for the physical sciences, arXiv:2405.05255]
Nayantara Mudur
CNN
Diffusion
Carolina Cuesta-Lazaro IAIFI/MIT @ Princeton 2024
["A Cosmic-Scale Benchmark for Symmetry-Preserving Data Processing"
Balla, Mishra-Sharma, Cuesta-Lazaro et al
NeurIPs 2024 NuerReps arXiv:2410.20516]
E(3) Equivariant architectures
Benchmark models
["Geometric and Physical Quantities Improve E(3) Equivariant Message Passing" Brandstetter et al arXiv:2110.02905]
Symmetry-preserving ML
Carolina Cuesta-Lazaro IAIFI/MIT @ Princeton 2024
The bitter lesson by Rich Sutton
The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. [...]
methods that continue to scale with increased computation even as the available computation becomes very great. [...]
We want AI agents that can discover like we can, not which contain what we have discovered.
Carolina Cuesta-Lazaro IAIFI/MIT @ Princeton 2024
["A Cosmic-Scale Benchmark for Symmetry-Preserving Data Processing"
Balla, Mishra-Sharma, Cuesta-Lazaro et al
NeurIPs 2024 NuerReps arXiv:2410.20516]
Carolina Cuesta-Lazaro IAIFI/MIT @ Princeton 2024
Memory scaling point clouds and voxels
Graph
Nodes
Edges
3D Mesh
Voxels
Both data representations scale badly with increasing resolution
Carolina Cuesta-Lazaro IAIFI/MIT @ Princeton 2024
Representing Continuous Fields
Continuous in space and time
x500 Compression!
Carolina Cuesta-Lazaro IAIFI/MIT @ Princeton 2024
Reconstructing dark matter back in time
Stochastic Interpolants
NF
["Joint cosmological parameter inference and initial condition reconstruction with Stochastic Interpolants" Cuesta-Lazaro, Bayer, Albergo et al NeurIPs 2024 ML for the Physical Sciences]
Adrian Bayer
Mount Fuji?
?
["Probabilistic Forecasting with Stochastic Interpolants and Foellmer Processes" Chen et al arXiv:2403.10648 (Figure adapted from arXiv:2407.21097)]
Generative SDE
Carolina Cuesta-Lazaro IAIFI/MIT @ Princeton 2024
Carolina Cuesta-Lazaro IAIFI/MIT @ Princeton 2024
Guided simulations with fuzzy constraints
Simulate what you need
(and sometimes what you want)
Carolina Cuesta-Lazaro IAIFI/MIT @ Princeton 2024
1. Generative models are more than fast emulators: robust field-level likelihood models
2. Continuous fields can be used to represent cosmological fields
3. Cosmological field level inference can be made efficient with generative models
Conclusions
Can we make them more simulation efficient?
Compression + efficient data format
Carolina Cuesta-Lazaro IAIFI/MIT @ Princeton 2024
Can generally make simulators more controllable!
PrincetonML2024
By carol cuesta
PrincetonML2024
- 28