Simulations meet Observations

IAIFI Fellow, MIT

Carolina Cuesta-Lazaro

Art: "Drawing Hands" by M.C. Escher

A Machine Learning perspective on modern Cosmology

1-Dimensional

Machine Learning

Secondary anisotropies

Galaxy formation

Intrinsic alignments

DESI, DESI-II, Spec-S5

Euclid, LSST

SO, CMB-S4

Ligo, Einstein

The era of Big Data Cosmology

xAstrophysics

5-Dimensional

w_0, w_a, f\sigma_8, \Omega_m, \sum m_\nu

HERA,CHIME

SAGA,MANGA

Galaxy formation

Emitters Census

Reionization

Carolina Cuesta-Lazaro IAIFI/MIT @ CCA 2025

Astrophysics dominates Simulation-based Inference

on Simulations

Carolina Cuesta-Lazaro IAIFI/MIT @ CCA 2025

Carolina Cuesta-Lazaro IAIFI/MIT @ CCA 2025

i) Generative Models:

Beyond Simulation Emulation

["A point cloud approach to generative modeling for galaxy surveys at the field level" 
Cuesta-Lazaro and Mishra-Sharma ]
["Diffusion-HMC: Parameter Inference with Diffusion Model driven Hamiltonian Monte Carlo" 
Mudur, Cuesta-Lazaro and Finkbeiner]

 

ii) Reconstructing latent features:

Dark matter, ICs...

iii) Learning to represent astrophysics

Baryonic feedback

iv) Anomaly Detection for new physics searches

Hybrid Simulators

Learning to represent feedback

Anomaly maps of the LSS

This talk:

Past Research

This talk:

Future Research

Cosmological Parameters

theory

\mathit{O}(2048^3)
p(z, \mathcal{\theta}|\delta_{\mathrm{Obs}})

?

Observed Density Field

today

\mathit{O}(10)
\mathit{O}(2048^3)
+
\Omega_m,
\sigma_8

Carolina Cuesta-Lazaro IAIFI/MIT @ CCA 2025

z

Dark matter density

back in time: ICs

\mathcal{\theta}
\delta_{\mathrm{Obs}}

TNG-300

True DM

Sample DM

Size of training simulation

1) Generalising to larger volumes

Model trained on Astrid subgrid model

2) Generalising to subgrid models

["Debiasing with Diffusion: Probabilistic reconstruction of Dark Matter fields from galaxies" 
Ono et al (including Cuesta-Lazaro) 
 arXiv:2403.10648]

 

["3D Reconstruction of Dark Matter Fields with Diffusion Models: Towards Application to Galaxy Surveys" 
Park, Mudur, Cuesta-Lazaro et al ICML 2024 AI for Science]

 

Posterior Sample

Posterior Mean

Debiasing Cosmic Flows

Carolina Cuesta-Lazaro IAIFI/MIT @ CCA 2025

True

Reconstructed

\delta_\mathrm{Obs}
\delta_\mathrm{ICs}

Carolina Cuesta-Lazaro IAIFI/MIT @ CCA 2025

p(\delta_\mathrm{ICs}, \theta|\delta_\mathrm{Obs}) =
p(\delta_\mathrm{ICs}|\delta_\mathrm{Obs})
p(\theta|\delta_\mathrm{ICs},\delta_\mathrm{Obs})

(Marginalizing over parameters)

["Joint cosmological parameter inference and initial condition reconstruction with Stochastic InterpolantsCuesta-Lazaro, Bayer, Albergo et al]

 

What we talk about when we talk about baryons

Carolina Cuesta-Lazaro IAIFI/MIT @ CCA 2025

Learning subgrid models from simulations that resolve the relevant physics

Bridging scales with ML subgrid models

Data-driven simulators

Learning subgrid models from observations directly

How do we incorporate theoretical uncertainties related to baryonic feedback?

Robust Inference

Disentangling baryonic effects from new physics

\Omega_m, \sigma_8

Simulator 1

Simulator 2

z
p(
, z)

Dark Matter

Feedback

\Omega_m, \sigma_8

Approach 1:

Contrastive

Baryonic fields

Approach 2:

Generative

z = Cosmological Mutual Information

z = Baryonic feedback

Carolina Cuesta-Lazaro IAIFI/MIT @ CCA 2025

Learning the feedback manifold

p(\mathcal{C},z|
)
z

Capture theoretical uncertainty on feedback

SZ, FRBs, Galaxy properties, X-Ray...

Carolina Cuesta-Lazaro IAIFI/MIT @ CCA 2025

Hybrid Simulators

\frac{\partial \mathbf{u}}{\partial t} = \mathcal{H}[\mathbf{x},\mathbf{R}_{\theta}(\mathbf{x})]

Hydro simulator

Subgrid model

Coupling of scales requires online training:

Simulator must be differentiable

A Hybrid Matrioshka

Match outputs from higher resolution simulations

Data driven subgrid models

Match outputs from high dimensional observations directly

["A Cosmic-Scale Benchmark for Symmetry-Preserving Data Processing" 
Balla, Mishra-Sharma, Cuesta-Lazaro et al]

 

Foundation models?

Symmetry preserving architectures?

What is the space of plausible solutions?

High resolution simulations too expensive for large training sets

Could absorb potential missing physics

 

\mathrm{R} \sim \mathrm{Observations}
\mathrm{F} \sim \mathrm{Simulations}

Real or Fake?

Detecting anomalies in the cosmic web

Carolina Cuesta-Lazaro IAIFI/MIT @ CCA 2025

\mathcal{L} = p(\theta|f(x_\mathrm{sim})) + \lambda \mathcal{L}_\text{classifier}(f(x_\mathrm{obs})),f(x_\mathrm{sim})))
\lambda > 0

Summarizer (NN)

Summaries informative of parameters

Anomalous SBI

highlight differences

Adversarial loss

\lambda < 0

Robust SBI

remove discrepancies

2. Machine Learning can help disentangle new physics from baryonic feedback

Conclusions

Can we leverage multi-wavelength observations?

3. Anomaly detection for Cosmology: Finding the missing pieces in simulations through adversarial classifiers

1. Cosmological field level inference can be made efficient with generative models

From dm reconstruction to ICs, robust to differences in hydro implementations

Can generally make simulators more controllable!

Carolina Cuesta-Lazaro IAIFI/MIT @ CCA 2025

CCA-Interview

By carol cuesta

CCA-Interview

  • 96