Big Data Cosmology meets AI

IAIFI Fellow

Carol Cuesta-Lazaro

 

Boston University - 27 February 2024

Video Credit: N-body simulation Francisco Villaescusa-Navarro

The era of Big Data Cosmology

1-Dimensional

Machine Learning

Secondary anisotropies

Galaxy formation

Intrinsic alignments

Dust

xAstrophysics

DESI, DESI-II, Spec-S5

Euclid

LSST

Simons Observatory

CMB-S4

Ligo

Einstein

LSST

Early Universe Inflation

{\delta_\mathrm{Initial}}

Late Universe

Energy and matter content

Evolution

{\delta_\mathrm{Final}}
\color{darkgray}{\Omega_m}

Dark matter

Dark energy

\color{darkgreen}{w_0, w_a}
\color{darkolive}{H_0}

Hubble Constant

\color{darkred}{\Omega_b}
\color{darkblue}{\sum m_\nu}

Baryons

Neutrino masses

\color{purple}{f_\mathrm{NL}}
\color{darkorange}{n_s}

Non-Gaussianity

Tilt power spectrum

Hubble tension

Beyond the Standard Model

Multifield Inflation

Dark Matter Reconstruction

2
3

Hybrid ML - Physics Simulators

1

Cosmological (field level) Inference for Galaxy Surveys

DESI

DESI: Dark Energy Spectroscopic Instrument

~40 Million spectra!

(Image Credit: Jinyi Yang, Steward Observatory/University of Arizona)
"Towards testing the theory of gravity with DESI: summary statistics, model predictions and future simulation requirements" Alam et al (including Cuesta-Lazaro)  JCAP

(Image Credit: D. Schlegel/Berkeley Lab using data from DESI)

High dimensional data 

x

Unknown

p(x|\mathcal{C})

Simple summary statistic 

s
p(s|\mathcal{C})

estimated with Perturbation Theory

Probability pair of galaxy

Pair separation

"Full-shape analysis with simulation-based priors: constraints on single field inflation from BOSS" Ivanov, Cuesta-Lazaro, et al Submitted PRD
"Towards a non-Gaussian model of redshift space distortions" Cuesta-Lazaro et al, MNRAS
\theta

Forward Model

Parameters

Observable

x

Likelihood

p(\mathcal{\theta}|x)

Simulator

+ MCMC hammer

\color{darkgray}{\Omega_m}, \color{darkgreen}{w_0, w_a},\color{purple}{f_\mathrm{NL}}\, ...

Dark matter

Dark energy

Inflation

Perturbation Theory

Pen and paper

p(x|\mathcal{\theta})

+ Density Estimation

+ Sampler

p(\mathcal{\theta}|x) =
p(\theta) / p(x)

High dimensional data 

Unknown

p(x|\mathcal{C})

Write your favourite summary statistic here

Simulations + ML

p(s|\mathcal{C})
"SUNBIRD: Neural-network-based models for galaxy clustering" Cuesta-Lazaro et al MNRAS

Density Split

\delta

Cluster

Void

Dark Matter

Tilt primordial Fluctuations

Clumpiness

Expansion rate

Neutrinos

"Baryons"

"Constraining νΛCDM with density-split clustering" Paillas, Cuesta-Lazaro et al

MNRAS

\mathrm{C} \cap \mathrm{A}
\mathrm{Cosmology}
\mathrm{Astrophysics}
"Cosmological constraints from density-split clustering in the BOSS CMASS galaxy sample" Paillas, Cuesta-Lazaro et al MNRAS
\delta_\mathrm{low}
\delta_\mathrm{high}
B > 0
B_\mathrm{cen} = -0.3^{+0.04}_{-0.18}
B < 0
P(N_\mathrm{galaxies})(M_h)
P(N_\mathrm{galaxies})(M_h, \delta)

Dark matter density field

DESI:

Alternative Clustering Methods

DESI

Initial conditions

\color{darkgray}{\Omega_m}, \color{darkgreen}{w_0, w_a},\color{purple}{f_\mathrm{NL}}\, ...

Dark matter

Dark energy

Inflation

\theta
+
Image credit: Bullock & Boylan-Kolchin

Dark matter halo mass

Number of objects

Dark matter halo mass

Dark matter halo mass

A forward model samples the likelihood

\theta

Parameters

Observable

x

Observed galaxy pointcloud

Initial conditions

+
\color{darkgray}{\Omega_m}, \color{darkgreen}{w_0, w_a},\color{purple}{f_\mathrm{NL}}\, ...

DESI

Forward Model

Example: How far will Patrick Mahomes throw?

def simulate_trajectory(
    velocity, 
    angle, 
    time_step=0.1, 
    g=9.8,
):
    z_velocity = np.random.normal(scale=2.)
    z_angle = np.random.normal(scale=2.)
    velocity  = velocity + z_velocity
    angle = angle + z_angle
    angle_rad = np.radians(angle)
    v_x = velocity * np.cos(angle_rad)
    v_y = velocity * np.sin(angle_rad)
    total_time = 2 * v_y / g
    times = np.arange(0, total_time, time_step)
    x = v_x * times
    y = v_y * times - 0.5 * g * times**2
    x_american = x * 1.09361
    return x_american, y, times
v \sim \mathcal{N} (23,2)
\phi \sim \mathcal{U} (30,45)

Sample Prior

Simulator

Latent variables z

(e.g. is Taylor Swift looking?)

x
p(x|\theta) = \int dz p(x,z|\theta)
x_\mathrm{simulator} \sim p(x|\theta)

Maximize the likelihood of the training samples

\hat \phi = \argmax \left[ \log p_\phi (x_\mathrm{train}) \right]

Model

p_\phi(x)

Training Samples

x_\mathrm{train}

Generate Novel Samples

Evaluate probabilities

Trained Model

p_\phi(x|\theta)

A 2D animation of a folk music band composed of anthropomorphic autumn leaves, each playing traditional bluegrass instruments, amidst a rustic forest setting dappled with the soft light of a harvest moon

Image credit: DALL·E 3 

 

1024x1024

"A point cloud approach to generative modeling for galaxy surveys at the field level" 
Cuesta-Lazaro and Mishra-Sharma 
ICML ML4Astro workshop (Spotlight talk) 

Base Distribution

Target Distribution

  • Sample
  • Evaluate

Fixed Initial Conditions

 Varying Cosmology

Mean pairwise

velocity

k Nearest neighbours

Pair separation

Pair separation

Trained on only 5000 positions!

p_\phi(\rho_\mathrm{DM}|\rho_\mathrm{Galaxies})

1 to Many:

Galaxy distribution 

Dark Matter

 "Probabilistic Reconstruction of Dark Matter fields from galaxies"
Park, Ono, Mudur, Ni, Cuesta-Lazaro NeurIPS Machine Learning and the Physical Sciences

Victoria Ono

Core Park

Truth

Sampled

Observed

Small

Large

Scale (k)

Power Spectrum

PDF

log Mass

log Mass

Counts

 ~ Gpc

pc

kpc

Mpc

Gpc

Video credit: Francisco Villaescusa-Navarro

Small

Large

\langle\mathrm{True}\,\,\mathrm{Pred}\rangle

In-Distribution

In-Distribution

In-Distribution

Out-of-Distribution

Out-of-Distribution

Out-of-Distribution

Out-of-Distribution

Out-of-Distribution

Out-of-Distribution

CAMELS

DESI LRG ~ 20 (Gpc/h)^3 

TNG-300

True DM

Sample DM

Can we run larger simulations? (DESI volumes)

At high resolution?

Faster?

All this works depends on simulations, but...

Thousands of them?

\frac{\mathrm{d} \mathbf{x}}{\mathrm{d} a } = \frac{1}{a^3 E(a)}\mathbf{v}
\frac{\mathrm{d} \mathbf{v}}{\mathrm{d} a } = \frac{1}{a^2 E(a)}\mathbf{F}(\mathbf{x},a)
\mathbf{F}(\mathbf{x},a) = \frac{3 \Omega_m}{2} \nabla \phi^\mathrm{PM}(\mathbf{x})

Gravitational evolution ODE

Particle-mesh

Particle-mesh

Full Nbody

\mathbf{F}_\theta(\mathbf{x},a) = \frac{3 \Omega_m}{2} \nabla \left[\phi^\mathrm{PM}(\mathbf{x}) + \phi^\mathrm{corr}_\theta(\mathbf{x}, a, \phi^\mathrm{PM}, \delta^\mathrm{PM}) \right]

Hybrid Simulator - on the fly

\frac{\mathrm{d} \mathbf{x}}{\mathrm{d} a } = \frac{1}{a^3 E(a)}\mathbf{v}
\frac{\mathrm{d} \mathbf{v}}{\mathrm{d} a } = \frac{1}{a^2 E(a)}\mathbf{F}(\mathbf{x},a)

Gravitational evolution ODE

Trained to match particle velocities and positions: DIFFERENTIABLE

Particle-mesh

Full Nbody

Hybrid ML-Simulator

"Nbodyify: Adaptive mesh corrections for PM simulations" Cuesta-Lazaro, Modi in preps

Hybrid subgrid models to bridge scales

High Res Sim

Springel and Hernquist 03

"Learning a subgrid model for the ISM from high resolution simulations" Jeffreson, Cuesta-Lazaro in preps

DESI

Building digital twins

Selections

Survey systematics

=

Robustness

Extract only information that is robust

across time

\mathrm{C} \cap \mathrm{A}
\mathrm{Cosmology}
\mathrm{Astrophysics}

AI4Science

Equivariance & Symmetries

Anomaly detection

Out-of-Distribution

Interpretability

Quantifying Uncertainties

Partial Observations

PDEs

Multimodal

Simulation-based-Inference

Foundation Models

Weather & Climate

Chemistry & Biology

Quantum Mechanics

Particle Physics

Astrophysics & Cosmology

Neuroscience

Hierarchical

Conclusions

1. There is a lot of information in galaxy surveys that ML methods can access

2. We can tackle high dimensional inference problems so far unatainable

3. Our ability to simulate will limit the amount of information we can extract

Hybrid simulators, forward models, robustness

Dark matter, Initial Conditions, let's get creative!

Field level inference

BU

By carol cuesta

BU

  • 53