Big Data Cosmology meets AI
IAIFI Fellow
Carol Cuesta-Lazaro

Boston University - 27 February 2024
Video Credit: N-body simulation Francisco Villaescusa-Navarro




The era of Big Data Cosmology



1-Dimensional





Machine Learning
Secondary anisotropies
Galaxy formation
Intrinsic alignments
Dust
xAstrophysics



DESI, DESI-II, Spec-S5
Euclid
LSST
Simons Observatory
CMB-S4
Ligo
Einstein
LSST
Early Universe Inflation

Late Universe

Energy and matter content
Evolution
Dark matter
Dark energy
Hubble Constant
Baryons
Neutrino masses
Non-Gaussianity
Tilt power spectrum
Hubble tension
Beyond the Standard Model
Multifield Inflation
Dark Matter Reconstruction


Hybrid ML - Physics Simulators
Cosmological (field level) Inference for Galaxy Surveys
DESI

DESI: Dark Energy Spectroscopic Instrument

~40 Million spectra!
(Image Credit: Jinyi Yang, Steward Observatory/University of Arizona)
"Towards testing the theory of gravity with DESI: summary statistics, model predictions and future simulation requirements" Alam et al (including Cuesta-Lazaro) JCAP

(Image Credit: D. Schlegel/Berkeley Lab using data from DESI)
High dimensional data
Unknown
Simple summary statistic
estimated with Perturbation Theory


Probability pair of galaxy
Pair separation
"Full-shape analysis with simulation-based priors: constraints on single field inflation from BOSS" Ivanov, Cuesta-Lazaro, et al Submitted PRD
"Towards a non-Gaussian model of redshift space distortions" Cuesta-Lazaro et al, MNRAS
Forward Model
Parameters
Observable
Likelihood
Simulator
+ MCMC hammer

Dark matter
Dark energy
Inflation
Perturbation Theory
Pen and paper



+ Density Estimation
+ Sampler
High dimensional data
Unknown
Write your favourite summary statistic here
Simulations + ML
"SUNBIRD: Neural-network-based models for galaxy clustering" Cuesta-Lazaro et al MNRAS
"LtU-ILI: An All-in-One Framework for Implicit Inference in Astrophysics and Cosmology" Ho, Barlett, Chartier, Cuesta-Lazaro et al
Density Split
Cluster
Void


Dark Matter
Tilt primordial Fluctuations
Clumpiness
Expansion rate
Neutrinos
"Baryons"
"Constraining νΛCDM with density-split clustering" Paillas, Cuesta-Lazaro et al
MNRAS

"Cosmological constraints from density-split clustering in the BOSS CMASS galaxy sample" Paillas, Cuesta-Lazaro et al MNRAS




Dark matter density field
DESI:
Alternative Clustering Methods
DESI

Initial conditions
Dark matter
Dark energy
Inflation

Image credit: Bullock & Boylan-Kolchin



Dark matter halo mass
Number of objects
Dark matter halo mass
Dark matter halo mass
A forward model samples the likelihood
Parameters
Observable
Observed galaxy pointcloud
Initial conditions

DESI

Forward Model
Example: How far will Patrick Mahomes throw?

def simulate_trajectory(
velocity,
angle,
time_step=0.1,
g=9.8,
):
z_velocity = np.random.normal(scale=2.)
z_angle = np.random.normal(scale=2.)
velocity = velocity + z_velocity
angle = angle + z_angle
angle_rad = np.radians(angle)
v_x = velocity * np.cos(angle_rad)
v_y = velocity * np.sin(angle_rad)
total_time = 2 * v_y / g
times = np.arange(0, total_time, time_step)
x = v_x * times
y = v_y * times - 0.5 * g * times**2
x_american = x * 1.09361
return x_american, y, times
Sample Prior
Simulator
Latent variables z
(e.g. is Taylor Swift looking?)


Maximize the likelihood of the training samples
Model
Training Samples


Generate Novel Samples
Evaluate probabilities
Trained Model





A 2D animation of a folk music band composed of anthropomorphic autumn leaves, each playing traditional bluegrass instruments, amidst a rustic forest setting dappled with the soft light of a harvest moon
Image credit: DALL·E 3
1024x1024

"A point cloud approach to generative modeling for galaxy surveys at the field level"
Cuesta-Lazaro and Mishra-Sharma
ICML ML4Astro workshop (Spotlight talk)
Base Distribution
Target Distribution
- Sample
- Evaluate

Fixed Initial Conditions
Varying Cosmology





Mean pairwise
velocity
k Nearest neighbours

Pair separation
Pair separation

Trained on only 5000 positions!

1 to Many:
Galaxy distribution
Dark Matter

"Probabilistic Reconstruction of Dark Matter fields from galaxies"
Park, Ono, Mudur, Ni, Cuesta-Lazaro NeurIPS Machine Learning and the Physical Sciences


Victoria Ono
Core Park


Truth
Sampled


Observed

Small
Large
Scale (k)
Power Spectrum

log Mass



log Mass
Counts




~ Gpc
pc
kpc
Mpc
Gpc
Video credit: Francisco Villaescusa-Navarro

Small
Large
In-Distribution
In-Distribution
In-Distribution
Out-of-Distribution
Out-of-Distribution
Out-of-Distribution
Out-of-Distribution
Out-of-Distribution
Out-of-Distribution

CAMELS
DESI LRG ~ 20 (Gpc/h)^3


TNG-300
True DM

Sample DM

Can we run larger simulations? (DESI volumes)
At high resolution?
Faster?
All this works depends on simulations, but...
Thousands of them?

Gravitational evolution ODE
Particle-mesh


Particle-mesh
Full Nbody
Hybrid Simulator - on the fly
Gravitational evolution ODE
Trained to match particle velocities and positions: DIFFERENTIABLE



Particle-mesh
Full Nbody
Hybrid ML-Simulator
"Nbodyify: Adaptive mesh corrections for PM simulations" Cuesta-Lazaro, Modi in preps
Hybrid subgrid models to bridge scales




High Res Sim
Springel and Hernquist 03
"Learning a subgrid model for the ISM from high resolution simulations" Jeffreson, Cuesta-Lazaro in preps
DESI

Building digital twins
Selections
Survey systematics

Robustness
Extract only information that is robust

across time


AI4Science
Equivariance & Symmetries
Anomaly detection
Out-of-Distribution
Interpretability
Quantifying Uncertainties
Partial Observations
PDEs
Multimodal
Simulation-based-Inference
Foundation Models
Weather & Climate

Chemistry & Biology
Quantum Mechanics
Particle Physics
Astrophysics & Cosmology




Neuroscience

Hierarchical
Conclusions

1. There is a lot of information in galaxy surveys that ML methods can access
2. We can tackle high dimensional inference problems so far unatainable
3. Our ability to simulate will limit the amount of information we can extract
Hybrid simulators, forward models, robustness
Dark matter, Initial Conditions, let's get creative!
Field level inference


BU - job talk - 2024
By carol cuesta
BU - job talk - 2024
- 259