Big Data Cosmology meets AI
An Invitation to Backpropagate Through the Origins of the Universe
IAIFI Fellow
Carol Cuesta-Lazaro
Video Credit: N-body simulation Francisco Villaescusa-Navarro
The era of Big Data Astrophysics
1-Dimensional
Machine Learning
Cosmic Cartography
Galaxy Clustering
Galaxy Imaging
Lensing
Cosmic Microwave Background
Gravitational Waves
Time domain
Early Universe Inflation
Late Universe
What's the Universe made of?
Evolution
Dark matter
Dark energy
Non-Gaussianity
...
Multifield Inflation
Initial Conditions
The Universe's forward model
Observables
Why Astrophysics is hard 101
Dataset Size = 1
Can't poke it in the lab
Simulations
Bayesian statistics
How well can we simulate the Universe?
Very interested on ideas in the area of model mispecification!
How do we learn what is the robust information?
Simulating dark matter is easy!
"Atoms" are hard" :(
Hybrid ML - Physics Simulators
Unsupervised searches
Cosmological (field level) Inference for Galaxy Surveys
DESI
High dimensional data
Unknown
Simple summary statistic
estimated with Perturbation Theory
Probability pair of galaxy
Pair separation
Forward Model
Parameters
Observable
Likelihood
Simulator
+ MCMC hammer
Dark matter
Dark energy
Inflation
Perturbation Theory
Pen and paper
+ Density Estimation
+ Sampler
"A point cloud approach to generative modeling for galaxy surveys at the field level"
Cuesta-Lazaro and Mishra-Sharma
arXiv:2311.17141
Base Distribution
Target Distribution
- Sample
- Evaluate
Siddharth Mishra-Sharma
Long range correlations
Huge pointclouds (20M)
Homogeneity and isotropy
Fixed Initial Conditions
Varying Cosmology
Trained on only 5000 positions!
Real observations 20 Million points :(
Learning in 5000 dimensions with only 2000 simulations
Symmetries?
Julia Balla
Loss
Step
Pair counting
MP GNN
Hierarchical
Symmetries
"GalaxyBench: A Long- and Short-Range Benchmark for Symmetry-Preserving Data Processing" Balla et al (in prep.)
1 to Many:
Can we run the Universe backwards?
Today
Initial Conditions
"Probabilistic Forecasting with Stochastic Interpolants and Follmer Processes" Chen et al
arXiv:2403.13724
Sampling SDE
Interpolant
Drift
Regression loss
"Probabilistic Forecasting with Stochastic Interpolants and Follmer Processes" Chen et al
arXiv:2403.13724
Current model is not very good when ran forwards!
3D U-Nets are annoying :(
True
Initial
Final
Predicted
Can we run larger simulations? (Observable volumes)
At high resolution?
Faster?
All this works depends on simulations, but...
Thousands of them?
Hybrid Physical / ML simulators
Gravitational evolution ODE
Particle-mesh
"Nbodyify: Adaptive mesh corrections for PM simulations" Cuesta-Lazaro, Modi in preps
Particle-mesh
Full Nbody
Hybrid Simulator - on the fly
Gravitational evolution ODE
Trained to match particle velocities and positions: DIFFERENTIABLE
Density
Gravitational Potential
1. CNN
2. Read features at position using attention
3. Compute force correction
4. Run corrected simulation
Learn features
Particle-mesh
Full Nbody
Hybrid ML-Simulator
"Nbodyify: Adaptive mesh corrections for PM simulations" Cuesta-Lazaro, Modi in preps
Video credit: Francisco Villaescusa-Navarro
Gas density
Gas temperature
Finding missing physics with differentiable simulators?
What is the space of plausible solutions and how do we search it?
Differentiable Galaxies ODEs
Humans best bet
Neural Network correction
Are there problems in cosmology that bypass a forward model?
Parity violation cannot be originated by gravity
"Measurements of parity-odd modes in the large-scale 4-point function of SDSS..." Hou, Slepian, Chan arXiv:2206.03625
"Could sample variance be responsible for the parity-violating signal seen in the BOSS galaxy survey?" Philcox, Ereza arXiv:2401.09523
Train
Test
Me: I can't wait to work with observations
Me working with observations:
Finding interesting objects:
Very small galaxies (dwarf galaxies)
Interesting in Astrophysics: How we define an anomaly and how do we find it?
Background
Region of Interest
Conclusions
1. There is a lot of information in galaxy surveys that ML methods can access
2. We can tackle high dimensional inference problems so far unatainable
3. Our ability to simulate will limit the amount of information we can extract
Hybrid simulators, forward models, robustness
Unsupervised problems: parity violation
Finding anomalies for new physics?
Finding the Initial Conditions of the Universe, let's get creative!
Field level inference
MIT-CTPSeminar2024
By carol cuesta
MIT-CTPSeminar2024
- 133