Beyond the Observable
IAIFI Fellow, MIT

Carolina Cuesta-Lazaro
Art: "Drawing Hands" by M.C. Escher
A Machine Learning perspective on modern Cosmology





1-Dimensional



Machine Learning
Secondary anisotropies
Galaxy formation
Intrinsic alignments



DESI / SphereX / Hetdex
Euclid / LSST
SO / CMB-S4
Ligo / Einstein


The era of Big Data Cosmology
xAstrophysics
HERA / CHIME
SAGA / MANGA




Galaxy formation
Emitters Census
Reionization


Cosmic Microwave Background
Galaxies / Dwarfs
21 cm
Galaxy Surveys
Gravitational Lensing
Gravitational Waves
Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025
AGN Feedback/Supernovae
We have increasingly precise observations of the cosmos, but our biggest questions remain about what we can't directly observe


Dark Matter
Gas
Bullet Cluster
["DESI 2024 VI: Cosmological Constraints from the Measurements of Baryon Acoustic Oscillations" arXiv:2404.03002]

Dark Energy is constant over time
Inflation
x5 times more collisionless matter than we can see
Dark Matter
Exponential expansion in the very early universe
Expansion is accelerating,
dynamical?
Dark Energy
Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025
Why Now?
Beyond tools
Optimisation
Neural representations
Baryonification

Inflation

Symmetry-preserving ML

Early Universe - JWST

Simulation Based Inference
Epidemiological simulations


Medical Imaging
Natural Language Processing

Exoplanets
Compute
Simulations
Data
ML
Statistics
Physics
What is dark matter made of?
What is driving the accelerated expansion?
How did the Universe begin?
A new way of thinking
about
physical systems

Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025
Prompting simulators
Running the clock backwards

Probabilistic Debiasing

Field-level inference for galaxy surveys
Fast Emulators + Likelihood Models

Uncertainty Quantification
Machine Learning enables new science in Cosmology
Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025

GANS

Deep Belief Networks
2006

VAEs

Normalising Flows

BigGAN

Diffusion Models

2014
2017
2019
2022
A folk music band of anthropomorphic autumn leaves playing bluegrass instruments
Contrastive Learning
2023
Meanwhile, on Earth...
Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025
["Genie 2: A large-scale foundation model" Parker-Holder et al]

["Generative AI for designing and validating easily synthesizable and structurally novel antibiotics" Swanson et al]
Probabilistic ML has made high dimensional inference tractable
1024x1024xTime
Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025

Dataset Size = 1
Can't poke it in the lab

Simulations
Bayesian statistics
But inference in Cosmology is hard
Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025
Simulations: Testing Ground and Theoretical Models
Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025










Astrophysics proliferates Simulation-based Inference
on Simulations
Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025
[Video credit: Francisco Villaescusa-Navarro]
Gas density
Gas temperature
Subgrid model 1
Subgrid model 2
Subgrid model 3
Subgrid model 4
New physics or pesky baryons?
We need to understand the baryons!
Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025

Simulations
Observations
Guided by observational constraints
Robust Inference
A Two-Way Road
Generative Models:
Beyond Simulation Emulation
Part 1
What is driving the accelerated expansion?
Reconstructing latent features:
Dark matter, ICs...
Part 2
How did the Universe begin?
What is dark matter made of?
Anomaly Detection for new physics searches
Baryonic feedback
Hybrid simulators
Part 3
Future Directions
Breaking LCDM
Predictive hydro sims
Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025

1. Extracting information from Galaxy Clustering
[Image Credit: Claire Lamman (CfA/Harvard) / DESI Collaboration]
Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025
Forward Model
Observable
Predict
Infer
Theory Parameters
Inverse mapping


+ MCMC hammer

Dark matter
Dark energy
Inflation
Initial conditions


["Cosmological constraints from density-split clustering in the BOSS CMASS galaxy sample" Paillas, Cuesta-Lazaro
et al arXiv:2309.16541 MNRAS]
["Cosmological constraints from the Minkowski functionals of the BOSS CMASS galaxy sample" Liu, Paillas, Cuesta-Lazaro
et al arXiv:2501.01698]
["SUNBIRD: A simulation-based model for full-shape density-split clustering" Cuesta-Lazaro et al arXiv:2309.16539 MNRAS]

Carol's optimistic forecast
Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025
Generative Models 101
Maximize the likelihood of the training samples
Parametric Model


Training Samples
Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025
Trained Model

Evaluate probabilities


Low Probability
High Probability

Generate Novel Samples


Simulator
Generative Model
Fast emulators
Testing Theories
Generative Model
Simulator
Generative Models: Simulate and Analyze
Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025




Reverse diffusion: Denoise previous step
Forward diffusion: Add Gaussian noise (fixed)
Prompt: A person half Yoda half Gandalf
Diffusion model
Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025

["A point cloud approach to generative modeling for galaxy surveys at the field level"
Cuesta-Lazaro and Mishra-Sharma
International Conference on Machine Learning ICML AI4Astro 2023, Spotlight talk, arXiv:2311.17141]
Base Distribution
Target Distribution
Simulated Galaxy 3d Map
Prompt:




Prompt: A person half Yoda half Gandalf
Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025

Fixed Initial Conditions / Varying Cosmology
Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025





Mean relative velocity
k Nearest neighbours

Pair separation
Pair separation
Reproducing Summary Statistics
Varying cosmological parameters
Physics as a testing ground: Well-understood summary statistics enable rigorous validation of generative models
Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025
Diffusion model

Diffusion
Pair Counting
Initial Clumpinesss
Matter Energy Density
Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025
With only O(10^4) galaxies!
More than doubling survey volume

CNN
Diffusion
Increasing Noise
["Diffusion-HMC: Parameter Inference with Diffusion Model driven Hamiltonian Monte Carlo" Mudur, Cuesta-Lazaro and Finkbeiner NeurIPs 2023 ML for the physical sciences, arXiv:2405.05255]

Nayantara Mudur


CNN
Diffusion
Generative model constraints are more robust
Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025
6 seconds / sim vs 40 million CPU hours
Fast Emulation:
Parameter constraints:
Generative Models: Simulate and Analyze





Diffusion
Pair Counting

Carol's optimistic forecast
High dimensional inference

Alternative Clustering Methods
Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025

1 to Many:
Galaxies (Observable)
Dark Matter (Unobservable)

["Debiasing with Diffusion: Probabilistic reconstruction of Dark Matter fields from galaxies" Ono et al (including Cuesta-Lazaro) NeurIPs 2023 ML for the physical Sciences / APJ arXiv:2403.10648]

Victoria Ono
Core F. Park

Probabilistic Reconstruction
Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025

Truth
Sampled


Observed

Small
Large
Scale (k)
Power Spectrum
Small
Large
Scale (k)
Cross correlation
Galaxies Sample CDM
Truth CDM
Galaxies
Galaxies Truth CDM
Truth CDM Sample CDM
Sample CDM
Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025

CAMELS
DESI LRG ~ 20 (Gpc/h)^3
Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025
Galaxies
True DM
Sample DM




Size of training simulation
1) Generalising to larger volumes
Model trained on Astrid subgrid model, tested on TNG
2) Generalising across galaxy formation models
["Debiasing with Diffusion: Probabilistic reconstruction of Dark Matter fields from galaxies" Ono et al (including Cuesta-Lazaro)
NeurIPs 2023 ML for the physical Sciences / APJ / arXiv:2403.10648]


Void
Galaxy Cluster

["3D Reconstruction of Dark Matter Fields with Diffusion Models: Towards Application to Galaxy Surveys" Park, Mudur, Cuesta-Lazaro et al International Conference on Machine Learning ICML 2024 AI for Science]
Posterior Sample
Posterior Mean
What's next?
Application to Galaxy Surveys
Robust to galaxy formation uncertainties
Probabilistic Reconstruction of Dark Matter


Inverse
Sampling over
20M dimensions
Each sample would cost 6k CPU hours
Observed Light
Dark Matter
Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025
Running the clock backwards
Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025


Initial Conditions
early Universe
Cosmological Parameters
theory
?

Observed Density Field
today




Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025

True
Reconstructed

["Joint cosmological parameter inference and initial condition reconstruction with Stochastic Interpolants"
Cuesta-Lazaro, Bayer, Albergo et al NeurIPs ML4PS 2024 Spotlight talk]
Generative Model
NF
Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025




?
["Stochastic Interpolants: A Unifying Framework for Flows and Diffusions" Albergo, Boffi, Vanden-Eijnden arXiv:2303.08797]

Generative SDE
Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2024




Sampling over jointly with theory parameters






Constrained Simulations for Galaxy Surveys
100M dimensions
Reconstructing ALL latent variables:
Dark Matter distribution
Entire formation history
Peculiar velocities
Interpretability:
Cross-Correlation with other probes

[Image Credit: Yuuki Omori]
Constraining Inflation:
Inferring primordial non-gaussianity
Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025

Simulations
Observations
Guided by observational constraints
Robust Inference
Generative Models:
Beyond Simulation Emulation
Part 1
What is driving the accelerated expansion?
Reconstructing latent features:
Dark matter, ICs...
Part 2
How did the Universe begin?
What is dark matter made of?
Anomaly Detection for new physics searches
Baryonic feedback
Hybrid simulators
Part 3
Future Directions
Breaking LCDM
Predictive hydro sims
Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025

Late Universe
Early Universe
Tension
2025
From Tensions to Discoveries: Anomalies in Cosmology

["Full-shape analysis with simulation-based priors: cosmological parameters and the structure growth anomaly" Ivanov, Obuljen, Cuesta-Lazaro, Toomey arXiv:2409.10609]
Carol's optimistic forecast
Early vs Late
Parametric Extensions
Challenge: Robust error bars against distribution shifts
Clumpinesss

BOSS (Late Universe)
BOSS (Late Universe) + ML
BOSS (Late Universe) + Bk + ML
Early Universe
Tension

[Image Credit: Prof. Wendy Freedman]
["Full-shape analysis with simulation-based priors: cosmological parameters and the structure growth anomaly" Ivanov, Obuljen, Cuesta-Lazaro, Toomey arXiv:2409.10609]
Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025
["A Parameter-Masked Mock Data Challenge for Beyond-Two-Point Galaxy Clustering statistics" Krause et al (including Cuesta-Lazaro) arXiv:arXiv:2405.02252]
["Neural Posterior Estimation with Adversarial Regularization: From Robust Statistics to Missing Physics" Cuesta-Lazaro et al in-prep]
Looking for what we don't know to look for
The missing pieces: Beyond parametric searches
Axion Dark Matter

Dark Matter - Baryon Interactions
Primordial Non-Gaussianity

Early Dark Energy
Dark Radiation
[Credit: Sandbox Studio]


[Credit: Sandbox Studio]

Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025
What makes an anomaly in Astrophysics interesting?
A representation learning problem:
where meaningful anomalies become apparent
Text-enhanced representations of X-Ray data
Rafael Martinez Galarza
["Augmenting X-Ray Astronomical Representations with Scientific Knowledge Through Contrastive Learning" Martinez-Galarza et al (including Cuesta-Lazaro) in-prep]



Alex Gagliano
Simulated Energy Sources
Physical parameters
(Temperature, ejecta velocity ...)
Physical Anomalies in Supernovae Lightcurves
Learned Shared Representation
Edgar Vidal
















Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025
Hybrid Simulators
["A Cosmic-Scale Benchmark for Symmetry-Preserving Data Processing" Balla, Mishra-Sharma, Cuesta-Lazaro et al LOG 2024 & NeurReps at NeurIPs 2024]
Foundation models?
Symmetry preserving architectures?
High resolution simulations too expensive for large training sets
New physics or pesky baryons?
The complex physics of galaxy formation can mimic signals we expect from new fundamental physics, making these effects difficult to disentangle
Hydro simulator
Subgrid model

Learning the Universe - Simons Collaboration
Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025

Simulations
Observations
Guided by observational constraints
Robust Inference
Anomaly Detection for new physics searches
Baryonic feedback
Hybrid simulators

Can LLMs close the loop?
Theory
Observations
Reconstructing latent features:
Dark matter, ICs...
Generative Models:
Beyond Simulation Emulation
Part 1
Part 2
Part 3
Future Directions
What is driving the accelerated expansion?
How did the Universe begin?
Breaking LCDM
Predictive hydro sims
What is dark matter made of?
Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025

Artificial General Intelligence?
Pre 2022
Post 2022


Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025


Meaningful benchmarks for AIScientists will lead to better AIScientists
LLMs have saturated current benchmarks
Six months later....
Years since benchmark introduction
Score relative to human performance
Human Performance

Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025
1. Hypothesis Generation
2. Implementing beyond LCDM models
4. Solving inverse problems
Self interacting dark radiation
Early Dark Energy
Compute modified Pk with CLASS
Obtaining posteriors and computing metrics for comparison to LCDM
Evaluating LLMs for Scientific Discovery


3. Data analysis
Gathering data and postprocessing
Goal: Resolving the Hubble tension
Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025
Beyond tools
Compute
Simulations
Data
ML
Statistics
Physics
Use-inspired AI developments
The future of Astrophysics
A new way of thinking
about
physical systems
Carolina Cuesta-Lazaro IAIFI/MIT @ NYU 2025

Eric Vanden-Eijnden
Kyunghyun Cho
Mehryar Mohri

Yann Lecun
Rob Fergus
CCPP
Denis Zorin
Roman Scoccimarro
Jeremy Tinker
Mike O'Neil
Anthony Pullen
Leslie Greengard
David W. Hogg
Georg Stadler
Ken Van Tilburg
Neal Weiner
Olivier Pauluis
Glennys R. Farrar
Edwin P. Berger
Yacine Ali-Haïmoud

NYU-2025
By carol cuesta
NYU-2025
- 79