Simulations meet Observations
IAIFI Fellow, MIT

Carolina Cuesta-Lazaro
Art: "Drawing Hands" by M.C. Escher
A Machine Learning perspective on modern Cosmology





1-Dimensional



Machine Learning
Secondary anisotropies
Galaxy formation
Intrinsic alignments



DESI, DESI-II, Spec-S5
Euclid, LSST
SO, CMB-S4
Ligo, Einstein


The era of Big Data Cosmology
xAstrophysics
5-Dimensional
HERA,CHIME
SAGA,MANGA




Galaxy formation
Emitters Census
Reionization


Carolina Cuesta-Lazaro IAIFI/MIT @ CCA 2025










Astrophysics dominates Simulation-based Inference
on Simulations
Carolina Cuesta-Lazaro IAIFI/MIT @ CCA 2025
Carolina Cuesta-Lazaro IAIFI/MIT @ CCA 2025

i) Generative Models:
Beyond Simulation Emulation
["A point cloud approach to generative modeling for galaxy surveys at the field level" Cuesta-Lazaro and Mishra-Sharma ]
["Diffusion-HMC: Parameter Inference with Diffusion Model driven Hamiltonian Monte Carlo" Mudur, Cuesta-Lazaro and Finkbeiner]
ii) Reconstructing latent features:
Dark matter, ICs...
iii) Learning to represent astrophysics
Baryonic feedback
iv) Anomaly Detection for new physics searches
Hybrid Simulators
Learning to represent feedback
Anomaly maps of the LSS
This talk:
Past Research
This talk:
Future Research
Cosmological Parameters
theory
?
Observed Density Field
today

Carolina Cuesta-Lazaro IAIFI/MIT @ CCA 2025
Dark matter density
back in time: ICs




TNG-300
True DM
Sample DM




Size of training simulation
1) Generalising to larger volumes
Model trained on Astrid subgrid model
2) Generalising to subgrid models
["Debiasing with Diffusion: Probabilistic reconstruction of Dark Matter fields from galaxies"
Ono et al (including Cuesta-Lazaro)
arXiv:2403.10648]



["3D Reconstruction of Dark Matter Fields with Diffusion Models: Towards Application to Galaxy Surveys" Park, Mudur, Cuesta-Lazaro et al ICML 2024 AI for Science]
Posterior Sample
Posterior Mean
Debiasing Cosmic Flows
Carolina Cuesta-Lazaro IAIFI/MIT @ CCA 2025

True
Reconstructed

Carolina Cuesta-Lazaro IAIFI/MIT @ CCA 2025
(Marginalizing over parameters)
["Joint cosmological parameter inference and initial condition reconstruction with Stochastic Interpolants" Cuesta-Lazaro, Bayer, Albergo et al]
What we talk about when we talk about baryons
Carolina Cuesta-Lazaro IAIFI/MIT @ CCA 2025
Learning subgrid models from simulations that resolve the relevant physics
Bridging scales with ML subgrid models
Data-driven simulators
Learning subgrid models from observations directly
How do we incorporate theoretical uncertainties related to baryonic feedback?
Robust Inference
Disentangling baryonic effects from new physics


Simulator 1
Simulator 2


Dark Matter
Feedback
Approach 1:
Contrastive
Baryonic fields
Approach 2:
Generative
z = Cosmological Mutual Information
z = Baryonic feedback
Carolina Cuesta-Lazaro IAIFI/MIT @ CCA 2025
Learning the feedback manifold


Capture theoretical uncertainty on feedback
SZ, FRBs, Galaxy properties, X-Ray...
Carolina Cuesta-Lazaro IAIFI/MIT @ CCA 2025
Hybrid Simulators
Hydro simulator
Subgrid model
Coupling of scales requires online training:
Simulator must be differentiable
A Hybrid Matrioshka
Match outputs from higher resolution simulations
Data driven subgrid models
Match outputs from high dimensional observations directly
["A Cosmic-Scale Benchmark for Symmetry-Preserving Data Processing"
Balla, Mishra-Sharma, Cuesta-Lazaro et al]
Foundation models?
Symmetry preserving architectures?
What is the space of plausible solutions?
High resolution simulations too expensive for large training sets
Could absorb potential missing physics

Real or Fake?
Detecting anomalies in the cosmic web
Carolina Cuesta-Lazaro IAIFI/MIT @ CCA 2025
Summarizer (NN)
Summaries informative of parameters
Anomalous SBI
highlight differences
Adversarial loss
Robust SBI
remove discrepancies


2. Machine Learning can help disentangle new physics from baryonic feedback
Conclusions

Can we leverage multi-wavelength observations?
3. Anomaly detection for Cosmology: Finding the missing pieces in simulations through adversarial classifiers
1. Cosmological field level inference can be made efficient with generative models
From dm reconstruction to ICs, robust to differences in hydro implementations

Can generally make simulators more controllable!


Carolina Cuesta-Lazaro IAIFI/MIT @ CCA 2025
CCA-Interview
By carol cuesta
CCA-Interview
- 96