Big Data Cosmology meets AI
IAIFI Fellow
Carol Cuesta-Lazaro
The Ohio State University - 15 April 2024
Video Credit: N-body simulation Francisco Villaescusa-Navarro
The era of Big Data Cosmology
1-Dimensional
Machine Learning
Secondary anisotropies
Galaxy formation
Intrinsic alignments
Dust
xAstrophysics
DESI, DESI-II, Spec-S5
Euclid
LSST
Simons Observatory
CMB-S4
Ligo
Einstein
LSST
Early Universe Inflation
Late Universe
Energy and matter content
Evolution
Dark matter
Dark energy
Hubble Constant
Baryons
Neutrino masses
Non-Gaussianity
Tilt power spectrum
Hubble tension
Beyond the Standard Model
Multifield Inflation
Dark Matter Reconstruction
Hybrid ML - Physics Simulators
Cosmological (field level) Inference for Galaxy Surveys
DESI
Fast Simulators
High dimensional inference
Modelling priors
Uncertainty quantification
arXiv:2403.10648
arXiv:2402.13310
arXiv:2309.09337
DESI: Dark Energy Spectroscopic Instrument
~40 Million spectra!
(Image Credit: Jinyi Yang, Steward Observatory/University of Arizona)
(Image Credit: D. Schlegel/Berkeley Lab using data from DESI)
High dimensional data
Unknown
Simple summary statistic
estimated with Perturbation Theory
Probability pair of galaxy
Pair separation
Forward Model
Parameters
Observable
Likelihood
Simulator
+ MCMC hammer
Dark matter
Dark energy
Inflation
Perturbation Theory
Pen and paper
+ Density Estimation
+ Sampler
DESI
Initial conditions
Dark matter
Dark energy
Inflation
A forward model samples the likelihood
Parameters
Observable
Observed galaxy pointcloud
Initial conditions
DESI
Forward Model
Example: How far will Justin Fields throw?
def simulate_trajectory(
velocity,
angle,
time_step=0.1,
g=9.8,
):
z_velocity = np.random.normal(scale=2.)
z_angle = np.random.normal(scale=2.)
velocity = velocity + z_velocity
angle = angle + z_angle
angle_rad = np.radians(angle)
v_x = velocity * np.cos(angle_rad)
v_y = velocity * np.sin(angle_rad)
total_time = 2 * v_y / g
times = np.arange(0, total_time, time_step)
x = v_x * times
y = v_y * times - 0.5 * g * times**2
x_american = x * 1.09361
return x_american, y, times
Sample Prior
Simulator
Latent variables z
Maximize the likelihood of the training samples
Model
Training Samples
Generate Novel Samples
Evaluate probabilities
Trained Model
A 2D animation of a folk music band composed of anthropomorphic autumn leaves, each playing traditional bluegrass instruments, amidst a rustic forest setting dappled with the soft light of a harvest moon
Image credit: DALL·E 3
1024x1024
"A point cloud approach to generative modeling for galaxy surveys at the field level"
Cuesta-Lazaro and Mishra-Sharma
https://arxiv.org/abs/2311.17141
Base Distribution
Target Distribution
- Sample
- Evaluate
Fixed Initial Conditions
Varying Cosmology
Mean pairwise
velocity
k Nearest neighbours
Pair separation
Pair separation
Trained on only 5000 positions!
https://arxiv.org/abs/2210.02747
https://arxiv.org/abs/2302.00482
Flow Matching
Flow ODE
Continuity Eq.
Random pairings (x0, x1)
Optimal Transport (x0, x1)
1 to Many:
https://arxiv.org/abs/2303.08797
ODE
SDE
Power Spectrum
Cross correlation
Small
Large
Scale (k)
Small
Large
Scale (k)
Small
Large
Scale (k)
Small
Large
Scale (k)
1 to Many:
Stellar Mass distribution
Dark Matter
"Probabilistic Reconstruction of Dark Matter fields from galaxies"
Park, Ono, Mudur, Ni, Cuesta-Lazaro NeurIPS Machine Learning and the Physical Sciences
Victoria Ono
Core Park
Truth
Sampled
Observed
Small
Large
Scale (k)
Power Spectrum
log Mass
~ Gpc
pc
kpc
Mpc
Gpc
Video credit: Francisco Villaescusa-Navarro
Small
Large
In-Distribution
In-Distribution
In-Distribution
Out-of-Distribution
Out-of-Distribution
Out-of-Distribution
Out-of-Distribution
Out-of-Distribution
Out-of-Distribution
CAMELS
DESI LRG ~ 20 (Gpc/h)^3
TNG-300
True DM
Sample DM
Small
Large
Scale (k)
Power Spectrum
log Mass
Can we run larger simulations? (DESI volumes)
At high resolution?
Faster?
All this works depends on simulations, but...
Thousands of them?
Gravitational evolution ODE
Particle-mesh
Particle-mesh
Full Nbody
Hybrid Simulator - on the fly
Gravitational evolution ODE
Trained to match particle velocities and positions: DIFFERENTIABLE
Particle-mesh
Full Nbody
Hybrid ML-Simulator
"Nbodyify: Adaptive mesh corrections for PM simulations" Cuesta-Lazaro, Modi in preps
Hybrid subgrid models to bridge scales
High Res Sim
Springel and Hernquist 03
DESI
Building digital twins
Selections
Survey systematics
Robustness
Extract only information that is robust
across time
Conclusions
1. There is a lot of information in galaxy surveys that ML methods can access
2. We can tackle high dimensional inference problems so far unatainable
3. Our ability to simulate will limit the amount of information we can extract
Hybrid simulators, forward models, robustness
Dark matter, Initial Conditions, let's get creative!
Field level inference
The Ohio State-Seminar2024
By carol cuesta
The Ohio State-Seminar2024
- 132