Big Data Cosmology meets AI
IAIFI Fellow
Carol Cuesta-Lazaro
Boston University - 27 February 2024
Video Credit: N-body simulation Francisco Villaescusa-Navarro
The era of Big Data Cosmology
1-Dimensional
Machine Learning
Secondary anisotropies
Galaxy formation
Intrinsic alignments
Dust
xAstrophysics
DESI, DESI-II, Spec-S5
Euclid
LSST
Simons Observatory
CMB-S4
Ligo
Einstein
LSST
Early Universe Inflation
Late Universe
Energy and matter content
Evolution
Dark matter
Dark energy
Hubble Constant
Baryons
Neutrino masses
Non-Gaussianity
Tilt power spectrum
Hubble tension
Beyond the Standard Model
Multifield Inflation
Dark Matter Reconstruction
Hybrid ML - Physics Simulators
Cosmological (field level) Inference for Galaxy Surveys
DESI
DESI: Dark Energy Spectroscopic Instrument
~40 Million spectra!
(Image Credit: Jinyi Yang, Steward Observatory/University of Arizona)
"Towards testing the theory of gravity with DESI: summary statistics, model predictions and future simulation requirements" Alam et al (including Cuesta-Lazaro) JCAP
(Image Credit: D. Schlegel/Berkeley Lab using data from DESI)
High dimensional data
Unknown
Simple summary statistic
estimated with Perturbation Theory
Probability pair of galaxy
Pair separation
"Full-shape analysis with simulation-based priors: constraints on single field inflation from BOSS" Ivanov, Cuesta-Lazaro, et al Submitted PRD
"Towards a non-Gaussian model of redshift space distortions" Cuesta-Lazaro et al, MNRAS
Forward Model
Parameters
Observable
Likelihood
Simulator
+ MCMC hammer
Dark matter
Dark energy
Inflation
Perturbation Theory
Pen and paper
+ Density Estimation
+ Sampler
High dimensional data
Unknown
Write your favourite summary statistic here
Simulations + ML
"SUNBIRD: Neural-network-based models for galaxy clustering" Cuesta-Lazaro et al MNRAS
"LtU-ILI: An All-in-One Framework for Implicit Inference in Astrophysics and Cosmology" Ho, Barlett, Chartier, Cuesta-Lazaro et al
Density Split
Cluster
Void
Dark Matter
Tilt primordial Fluctuations
Clumpiness
Expansion rate
Neutrinos
"Baryons"
"Constraining νΛCDM with density-split clustering" Paillas, Cuesta-Lazaro et al
MNRAS
"Cosmological constraints from density-split clustering in the BOSS CMASS galaxy sample" Paillas, Cuesta-Lazaro et al MNRAS
Dark matter density field
DESI:
Alternative Clustering Methods
DESI
Initial conditions
Dark matter
Dark energy
Inflation
Image credit: Bullock & Boylan-Kolchin
Dark matter halo mass
Number of objects
Dark matter halo mass
Dark matter halo mass
A forward model samples the likelihood
Parameters
Observable
Observed galaxy pointcloud
Initial conditions
DESI
Forward Model
Example: How far will Patrick Mahomes throw?
def simulate_trajectory(
velocity,
angle,
time_step=0.1,
g=9.8,
):
z_velocity = np.random.normal(scale=2.)
z_angle = np.random.normal(scale=2.)
velocity = velocity + z_velocity
angle = angle + z_angle
angle_rad = np.radians(angle)
v_x = velocity * np.cos(angle_rad)
v_y = velocity * np.sin(angle_rad)
total_time = 2 * v_y / g
times = np.arange(0, total_time, time_step)
x = v_x * times
y = v_y * times - 0.5 * g * times**2
x_american = x * 1.09361
return x_american, y, times
Sample Prior
Simulator
Latent variables z
(e.g. is Taylor Swift looking?)
Maximize the likelihood of the training samples
Model
Training Samples
Generate Novel Samples
Evaluate probabilities
Trained Model
A 2D animation of a folk music band composed of anthropomorphic autumn leaves, each playing traditional bluegrass instruments, amidst a rustic forest setting dappled with the soft light of a harvest moon
Image credit: DALL·E 3
1024x1024
"A point cloud approach to generative modeling for galaxy surveys at the field level"
Cuesta-Lazaro and Mishra-Sharma
ICML ML4Astro workshop (Spotlight talk)
Base Distribution
Target Distribution
- Sample
- Evaluate
Fixed Initial Conditions
Varying Cosmology
Mean pairwise
velocity
k Nearest neighbours
Pair separation
Pair separation
Trained on only 5000 positions!
1 to Many:
Galaxy distribution
Dark Matter
"Probabilistic Reconstruction of Dark Matter fields from galaxies"
Park, Ono, Mudur, Ni, Cuesta-Lazaro NeurIPS Machine Learning and the Physical Sciences
Victoria Ono
Core Park
Truth
Sampled
Observed
Small
Large
Scale (k)
Power Spectrum
log Mass
log Mass
Counts
~ Gpc
pc
kpc
Mpc
Gpc
Video credit: Francisco Villaescusa-Navarro
Small
Large
In-Distribution
In-Distribution
In-Distribution
Out-of-Distribution
Out-of-Distribution
Out-of-Distribution
Out-of-Distribution
Out-of-Distribution
Out-of-Distribution
CAMELS
DESI LRG ~ 20 (Gpc/h)^3
TNG-300
True DM
Sample DM
Can we run larger simulations? (DESI volumes)
At high resolution?
Faster?
All this works depends on simulations, but...
Thousands of them?
Gravitational evolution ODE
Particle-mesh
Particle-mesh
Full Nbody
Hybrid Simulator - on the fly
Gravitational evolution ODE
Trained to match particle velocities and positions: DIFFERENTIABLE
Particle-mesh
Full Nbody
Hybrid ML-Simulator
"Nbodyify: Adaptive mesh corrections for PM simulations" Cuesta-Lazaro, Modi in preps
Hybrid subgrid models to bridge scales
High Res Sim
Springel and Hernquist 03
"Learning a subgrid model for the ISM from high resolution simulations" Jeffreson, Cuesta-Lazaro in preps
DESI
Building digital twins
Selections
Survey systematics
Robustness
Extract only information that is robust
across time
AI4Science
Equivariance & Symmetries
Anomaly detection
Out-of-Distribution
Interpretability
Quantifying Uncertainties
Partial Observations
PDEs
Multimodal
Simulation-based-Inference
Foundation Models
Weather & Climate
Chemistry & Biology
Quantum Mechanics
Particle Physics
Astrophysics & Cosmology
Neuroscience
Hierarchical
Conclusions
1. There is a lot of information in galaxy surveys that ML methods can access
2. We can tackle high dimensional inference problems so far unatainable
3. Our ability to simulate will limit the amount of information we can extract
Hybrid simulators, forward models, robustness
Dark matter, Initial Conditions, let's get creative!
Field level inference
BU - job talk - 2024
By carol cuesta
BU - job talk - 2024
- 168