Lessons learned in ML and Cosmology
IAIFI Fellow, MIT
Carolina Cuesta-Lazaro
Art: "A philosopher" by Salomon Konicnk
Unicorns, rainbows and the real Universe
Carolina Cuesta-Lazaro IAIFI/MIT @ DL in Solar Physics 2024
["DESI 2024 VI: Cosmological Constraints from the Measurements of Baryon Acoustic Oscillations" arXiv:2404.03002]
What role did Machine Learning play?
Dark Energy is constant over time
DESI's Dark Energy constraints
Astrophysics dominates Simulation-based Inference
Carolina Cuesta-Lazaro IAIFI/MIT @ DL in Solar Physics 2024
Dataset Size = 1
Can't poke it in the lab
Simulations
Bayesian statistics
Cosmology is hard
Carolina Cuesta-Lazaro IAIFI/MIT @ DL in Solar Physics 2024
1-Dimensional
Machine Learning
Secondary anisotropies
Galaxy formation
Intrinsic alignments
DESI, DESI-II, Spec-S5
Euclid / LSST
Simons Observatory
CMB-S4
Ligo
Einstein
The era of Big Data Cosmology
Carolina Cuesta-Lazaro IAIFI/MIT @ DL in Solar Physics 2024
Unicorn land The promise of ML for Cosmology
Reality Check Roadblocks & Bottlenecks
Outline of this talk
Mapping dark matter
Reverting gravitational evolution
Field Level Inference
Learning to represent baryonic feedback
Data-driven hybrid simulators
Unsupervised problems
Carolina Cuesta-Lazaro IAIFI/MIT @ DL in Solar Physics 2024
[Image Credit: Claire Lamman (CfA/Harvard) / DESI Collaboration]
Carolina Cuesta-Lazaro IAIFI/MIT @ DL in Solar Physics 2024
["A point cloud approach to generative modeling for galaxy surveys at the field level"
Cuesta-Lazaro and Mishra-Sharma
arXiv:2311.17141]
Base Distribution
Target Distribution
- Sample
- Evaluate
Long range correlations
Huge pointclouds (20M)
Homogeneity and isotropy
Siddharth Mishra-Sharma
Carolina Cuesta-Lazaro IAIFI/MIT @ DL in Solar Physics 2024
Lesson #1: leverage data representations + symmetries
Fixed Initial Conditions / Varying Cosmology
Carolina Cuesta-Lazaro IAIFI/MIT @ DL in Solar Physics 2024
Diffusion model
Carolina Cuesta-Lazaro IAIFI/MIT @ DL in Solar Physics 2024
CNN
Diffusion
Increasing Noise
["Diffusion-HMC: Parameter Inference with Diffusion Model driven Hamiltonian Monte Carlo"
Mudur, Cuesta-Lazaro and Finkbeiner]
Nayantara Mudur
["Your diffusion model is secretly a certifiably robust classifier"
Chen et al
arXiv:2402.02316]
CNN
Diffusion
Carolina Cuesta-Lazaro IAIFI/MIT @ DL in Solar Physics 2024
Lesson #2: learning likelihoods can be more robust than poseriors
Do we actually need Density Estimation?
Just use binary classifiers!
Binary cross-entropy
Sample from simulator
Mix-up
Likelihood-to-evidence ratio
Carolina Cuesta-Lazaro IAIFI/MIT @ DL in Solar Physics 2024
["Likelihood-free MCMC with Amortized Approximate Ratio Estimator" Hermans et al]
Lesson #3: Classifiers are awesome
Likelihood-to-evidence ratio
Carolina Cuesta-Lazaro IAIFI/MIT @ DL in Solar Physics 2024
How good is my model?
Carolina Cuesta-Lazaro IAIFI/MIT @ DL in Solar Physics 2024
["Do Deep Generative Models know what they don't know?" Nalisnick et al]
p(x)
Classsifier
Simulations
Observation
Lesson #4: What should x be?
Observed
Simulated
1 to Many:
Distribution of Galaxies
Underlying Dark Matter
["Debiasing with Diffusion: Probabilistic reconstruction of Dark Matter fields from galaxies"
Ono et al arXiv:2403.10648]
Victoria Ono
Core Park
Carolina Cuesta-Lazaro IAIFI/MIT @ DL in Solar Physics 2024
Lesson #5: Most problems 1 to Many
Truth
Sampled
Observed
Small
Large
Scale (k)
Power Spectrum
Small
Large
Scale (k)
Cross correlation
Carolina Cuesta-Lazaro IAIFI/MIT @ DL in Solar Physics 2024
TNG-300
True DM
Sample DM
Carolina Cuesta-Lazaro IAIFI/MIT @ DL in Solar Physics 2024
["3D Reconstruction of Dark Matter Fields with Diffusion Models: Towards Application to Galaxy Surveys" Park, Mudur, Cuesta-Lazaro et al (in-prep)]
Posterior Sample
Posterior Mean
Debiasing Cosmic Flows
Carolina Cuesta-Lazaro IAIFI/MIT @ DL in Solar Physics 2024
Reconstructing dark matter back in time
Stochastic Interpolants
NF
Carolina Cuesta-Lazaro IAIFI/MIT @ DL in Solar Physics 2024
Lesson #6: Match two distributions that are already close!
Carolina Cuesta-Lazaro IAIFI/MIT @ DL in Solar Physics 2024
?
["Probabilistic Forecasting with Stochastic Interpolants and Foellmer Processes" Chen et al arXiv:2403.10648 (Figure adapted from arXiv:2407.21097)]
Simulating what you need (and sometimes what you want)
Guided simulations with fuzzy constraints
Carolina Cuesta-Lazaro IAIFI/MIT @ DL in Solar Physics 2024
Carolina Cuesta-Lazaro IAIFI/MIT @ DL in Solar Physics 2024
Simulating what you need (and sometimes what you want)
Can we run larger simulations? (DESI volumes)
At high resolution?
Faster?
All this works depends on simulations, but...
Thousands of them?
Carolina Cuesta-Lazaro IAIFI/MIT @ DL in Solar Physics 2024
Gravitational evolution ODE
Particle-mesh
Carolina Cuesta-Lazaro IAIFI/MIT @ DL in Solar Physics 2024
Particle-mesh
N-body
Carolina Cuesta-Lazaro IAIFI/MIT @ DL in Solar Physics 2024
Hybrid Simulator - on the fly
Gravitational evolution ODE
Trained to match particle velocities and positions: DIFFERENTIABLE
Carolina Cuesta-Lazaro IAIFI/MIT @ DL in Solar Physics 2024
Particle-mesh
N-body
Hybrid ML-Simulator
"Nbodyify: Adaptive mesh corrections for PM simulations" Cuesta-Lazaro, Modi in preps
Carolina Cuesta-Lazaro IAIFI/MIT @ DL in Solar Physics 2024
Lesson #7: Substantial speed ups without accuracy loss are very hard to achieve
What is the space of plausible solutions and how do we search it?
Differentiable Galaxies ODEs
Our best bet
Neural Network corrections
Finding the missing pieces
Data-driven hybrid simulators
Are these models predictive?
Carolina Cuesta-Lazaro IAIFI/MIT @ DL in Solar Physics 2024
Compressing cosmological simulations
~ 10 trillion particles per snapshot stored
x Discrete snapshots
Can we learn compressed continuous representations with Neural Fields?
Carolina Cuesta-Lazaro IAIFI/MIT @ DL in Solar Physics 2024
How do we learn what is the robust information?
Simulating dark matter is easy!
"Atoms" are hard" :(
N-body Simulations
Hydrodynamics
Can we improve our simulators in a data-driven way?
How well can we simulate the Universe?
(if cold!)
Carolina Cuesta-Lazaro IAIFI/MIT @ DL in Solar Physics 2024
~ Gpc
pc
kpc
Mpc
Gpc
[Video credit: Francisco Villaescusa-Navarro]
Gas density
Gas temperature
Carolina Cuesta-Lazaro IAIFI/MIT @ DL in Solar Physics 2024
Small
Large
In-Distribution
In-Distribution
In-Distribution
Out-of-Distribution
Out-of-Distribution
Out-of-Distribution
Out-of-Distribution
Out-of-Distribution
Out-of-Distribution
Carolina Cuesta-Lazaro IAIFI/MIT @ DL in Solar Physics 2024
["Multifield Cosmology with Artificial Intelligence" Villaescusa-Navarro et al arXiv:2109.09747]
Out-of-Distribution
In-Distribution
Carolina Cuesta-Lazaro IAIFI/MIT @ DL in Solar Physics 2024
Simulator 1
Simulator 2
Dark Matter
Feedback
Learning to parametrise feedback
Contrastive
Carolina Cuesta-Lazaro IAIFI/MIT @ DL in Solar Physics 2024
Lesson #8: Think carefully about the representations you care about
Parity violation cannot be originated by gravity
["Measurements of parity-odd modes in the large-scale 4-point function of SDSS..." Hou, Slepian, Chan arXiv:2206.03625]
["Could sample variance be responsible for the parity-violating signal seen in the BOSS galaxy survey?" Philcox, Ereza arXiv:2401.09523]
Carolina Cuesta-Lazaro IAIFI/MIT @ DL in Solar Physics 2024
Real or Fake?
x or Mirror x?
Carolina Cuesta-Lazaro IAIFI/MIT @ DL in Solar Physics 2024
Train
Test
Me: I can't wait to work with observations
Me working with observations:
Carolina Cuesta-Lazaro IAIFI/MIT @ DL in Solar Physics 2024
Lesson #9: Low data regime + low signal to noise ratio = difficult to find data-efficient architectures
1. There is a lot of information in galaxy surveys that ML methods can access
2. We can tackle high dimensional inference problems so far unatainable
3. Our ability to simulate limits the amount of information we can robustly extract
Hybrid simulators, forward models, robustness
Unsupervised problems: parity violation
Mapping dark matter, constrained simulations... Let's get creative!
Field level inference
Conclusions
Carolina Cuesta-Lazaro IAIFI/MIT @ DL in Solar Physics 2024
Solar2024
By carol cuesta
Solar2024
- 82