Beyond the Observable

IAIFI Fellow, MIT

Carolina Cuesta-Lazaro

Art: "Drawing Hands" by M.C. Escher

A Machine Learning perspective on modern Cosmology

1-Dimensional

Machine Learning

Secondary anisotropies

Galaxy formation

Intrinsic alignments

DESI / SphereX / Hetdex

Euclid / LSST

SO / CMB-S4

Ligo / Einstein

The era of Big Data Cosmology

xAstrophysics

5-Dimensional

w_0, w_a, f\sigma_8, \Omega_m, \sum m_\nu

HERA / CHIME

SAGA / MANGA

Galaxy formation

Emitters Census

Reionization

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

Cosmic Microwave Background

Galaxies / Dwarfs

21 cm

Galaxy Surveys

Gravitational Lensing

Gravitational Waves

We have increasingly precise observations of the cosmos, but our biggest questions remain about what we can't directly observe

Dark Matter

Gas

Bullet Cluster
\Lambda \mathrm{CDM}
["DESI 2024 VI: Cosmological Constraints from the Measurements of Baryon Acoustic Oscillations" arXiv:2404.03002]

Dark Energy is constant over time

Inflation

x5 times more collisionless matter than we can see

Dark Matter

Exponential expansion in the very early universe

Expansion is accelerating

Dark Energy

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

GANS

Deep Belief Networks

2006

VAEs

Normalising Flows

BigGAN

Diffusion Models

2014

2017

2019

2022

A folk music band of anthropomorphic autumn leaves playing bluegrass instruments

CLIP

2023

Meanwhile, on Earth...

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

p(\mathrm{World}|\mathrm{Prompt})
["Genie 2: A large-scale foundation model" Parker-Holder et al]
p(\mathrm{Drug}|\mathrm{Properties})
["Generative AI for designing and validating easily synthesizable and structurally novel antibiotics" Swanson et al]

Probabilistic ML has made high dimensional inference tractable

1024x1024xTime

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

Prompting simulators

Running the clock backwards

3

Probabilistic Debiasing

2

Field-level inference for galaxy surveys

1

Fast Emulators + Likelihood Models

Uncertainty Quantification

p(\mathrm{Dark Energy}|\mathrm{Galaxies})
p(\mathrm{Dark Matter}|\mathrm{Galaxies})
p(\mathrm{ICs}|\mathrm{Today})

High-dimensional Inference enables new science in Cosmology

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

Dataset Size = 1 

Can't poke it in the lab 

Simulations

Bayesian statistics

But inference in Cosmology is hard

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

Simulations: Theory and Testing Ground

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

Generative Models 101

Maximize the likelihood of the training samples

\hat \phi = \argmax \left[ \log p_\phi (x_\mathrm{train}) \right]
x_1
x_2

Parametric Model

p_\phi(x)

Training Samples

x_\mathrm{train}

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

x_1
x_2

Trained Model

p_\phi(x)

Evaluate probabilities

Low Probability

High Probability

Generate Novel Samples

Simulator

Simulator

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

Astrophysics proliferates Simulation-based Inference

on Simulations

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

[Video credit: Francisco Villaescusa-Navarro]

Gas density

Gas temperature

Subgrid model 1

Subgrid model 2

Subgrid model 3

Subgrid model 4

New physics or pesky baryons?

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

Simulations

Observations

Guided by observational constraints

Robust Inference

Reconstructing latent features:

Dark matter, ICs...

Generative Models:

Beyond Simulation Emulation

A Two-Way Road

Anomaly Detection for new physics searches

Baryonic feedback

Hybrid simulators

Part 1

Part 2

Part 3

Future Directions

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

1. Extracting information from Galaxy Clustering

[Image Credit: Claire Lamman (CfA/Harvard) / DESI Collaboration]

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

["DESI 2024 VI: Cosmological Constraints from the Measurements of Baryon Acoustic Oscillations" arXiv:2404.03002]

DESI's Dark Energy constraints

10 Million Galaxies -> 10's data points

Very lossy compression:

Pair separation

Counts

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

\theta

Forward Model

Observable

x

Predict

Infer

Parameters

Inverse mapping

Perturbation Theory

Pen and paper

Simulations

p(\mathcal{\theta}|x)

+ MCMC hammer

\color{darkgray}{\Omega_m}, \color{darkgray}{w_0, w_a},\color{darkgray}{f_\mathrm{NL}}\, ...

Dark matter

Dark energy

Inflation

Initial conditions

+
["Cosmological constraints from density-split clustering in the BOSS CMASS galaxy sample" Paillas, Cuesta-Lazaro et al arXiv:2309.16541 MNRAS]

 

["Cosmological constraints from the Minkowski functionals of the BOSS CMASS galaxy sample" Liu, Paillas, Cuesta-Lazaro et al arXiv:2501.01698]

 

["SUNBIRD: A simulation-based model for full-shape density-split clustering" Cuesta-Lazaro et al arXiv:2309.16539 MNRAS]

 

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

p(z_1|z_0)
p(z_0)
p(z_T)
p(z_2)
p(z_1)
p(z_2|z_1)
p(z_T|z_2)

Reverse diffusion: Denoise previous step

Forward diffusion: Add Gaussian noise (fixed)

Prompt: A person half Yoda half Gandalf

Diffusion model

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

["A point cloud approach to generative modeling for galaxy surveys at the field level" 
Cuesta-Lazaro and Mishra-Sharma 

ICML AI4Astro 2023 (Spotlight talk), arXiv:2311.17141]

Base Distribution

Target Distribution

  • Sample
  • Evaluate

Simulated Galaxy 3d Map

Prompt:

\Omega_m, \sigma_8

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

Fixed Initial Conditions / Varying Cosmology

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

Mean relative velocity

k Nearest neighbours

Pair separation

Pair separation

Reproducing Summary Statistics

Varying cosmological parameters

["How DREAMS are made: Emulating Satellite Galaxy and Subhalo Populations with Diffusion Models and Point Clouds" 
Nguyen et al (including Cuesta-Lazaro) arXiv:2409.02980]

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

p(\theta|x) = \frac{p(x|\theta)p(\theta)}{p(x)}

Diffusion model

With only 5000 galaxies!

Diffusion

Pair Counting

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

CNN

Diffusion

Increasing Noise

p(\sigma_8|\delta_m)
p(\sigma_8|\delta_m + 0.01 \epsilon)
p(\sigma_8|\delta_m + 0.02 \epsilon)
["Diffusion-HMC: Parameter Inference with Diffusion Model driven Hamiltonian Monte Carlo" 
Mudur, Cuesta-Lazaro and Finkbeiner
NeurIPs 2023 ML for the physical sciences, arXiv:2405.05255]

 

Nayantara Mudur

CNN

Diffusion

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

p_\phi(\rho_\mathrm{DM}|\rho_\mathrm{Galaxies})

1 to Many:

Galaxies

Dark Matter

["Debiasing with Diffusion: Probabilistic reconstruction of Dark Matter fields from galaxies" 
Ono et al (including Cuesta-Lazaro) 
NeurIPs 2023 ML for the physical Sciences / APJ
arXiv:2403.10648]

 

Victoria Ono

Core F. Park

2. Probabilistic Reconstruction

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

Truth

Sampled

Observed

Small

Large

Scale (k)

Power Spectrum

Small

Large

Scale (k)

Cross correlation

Sample CDM

Truth CDM

Galaxies

Galaxies Sample CDM

Truth CDM

Galaxies

Galaxies Truth CDM

 Truth CDM Sample CDM

Sample CDM

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

Galaxies

True DM

Sample DM

Size of training simulation

1) Generalising to larger volumes

Model trained on Astrid subgrid model

2) Generalising to subgrid models

["Debiasing with Diffusion: Probabilistic reconstruction of Dark Matter fields from galaxies" 
Ono et al (including Cuesta-Lazaro) 
NeurIPs 2023 ML for the physical Sciences / APJ / arXiv:2403.10648]

 

["3D Reconstruction of Dark Matter Fields with Diffusion Models: Towards Application to Galaxy Surveys" 
Park, Mudur, Cuesta-Lazaro et al ICML 2024 AI for Science]

 

Posterior Sample

Posterior Mean

Debiasing Cosmic Flows

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

Initial Conditions

early Universe

Cosmological Parameters

theory

\mathit{O}(2048^3)
p(\delta_{\mathrm{ICs}}, \mathcal{\theta}|\delta_{\mathrm{Obs}})

?

Observed Density Field

today

\mathit{O}(10)
\mathit{O}(2048^3)
+
\Omega_m,
\sigma_8

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

Hamiltonian Monte Carlo

1) Likelihood is intractable for realistic scenarios, but can get samples from simulator

2) Forward model has to be differentiable

(and relatively fast)

3) Not amortized

\log P(\delta_\mathrm{Obs}|\delta_\mathrm{ICs}, \mathcal{C}) = -\frac{1}{2} \sum_i \left(\frac{\mathcal{S}^i[\delta_{\mathrm{ICs}}, \mathcal{C}] - \delta_{\mathrm{Obs}}^i}{\sigma} \right)^2
["Bayesian physical reconstruction of initial conditions from large scale structure surveys" 
Jasche, Wandelt arXiv:1203.3639]

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

True

Reconstructed

\delta_\mathrm{Obs}
\delta_\mathrm{ICs}
p(\delta_\mathrm{ICs}, \theta|\delta_\mathrm{Obs}) =
p(\delta_\mathrm{ICs}|\delta_\mathrm{Obs})
p(\theta|\delta_\mathrm{ICs},\delta_\mathrm{Obs})

(Marginalizing over parameters)

["Joint cosmological parameter inference and initial condition reconstruction with Stochastic InterpolantsCuesta-Lazaro, Bayer, Albergo et al 
NeurIPs ML4PS 2024 Spotlight talk]

 

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

Initial Conditions

Finals

Small

Large

Scale (k)

Cross correlation

Perfect Reconstruction

(impossible due to information loss)

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

Generative Model

NF

p(\delta_\mathrm{ICs}, \theta|\delta_\mathrm{Obs}) =
p(\delta_\mathrm{ICs}|\delta_\mathrm{Obs})
p(\theta|\delta_\mathrm{ICs},\delta_\mathrm{Obs})

(Marginalizing over parameters)

1) Likelihood not necessarily Gaussian

2) Forward model no need differentiable

3) Amortized

["Full-shape analysis with simulation-based priors: Constraints on single field inflation from BOSS" Ivanov, Cuesta-Lazaro et al arXiv:2405.02252 Physical Review D]

 

\mathrm{SI}:
\mathrm{SI}:
\mathrm{HMC}:

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

Simulations

Observations

Guided by observational constraints

Robust Inference

Anomaly Detection for new physics searches

Baryonic feedback

Hybrid simulators

Can LLMs close the loop?

Theory

Observations

Reconstructing latent features:

Dark matter, ICs...

Generative Models:

Beyond Simulation Emulation

Part 1

Part 2

Part 3

Future Directions

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

Late Universe

Early Universe

Tension

2025

Increasing significance:

Robust error bars  against distribution shifts

The missing pieces: Searching for what we don't know to look for

STRESS TESTING LCDM

From Tensions to Discoveries:  Anomalies in Cosmology

["A Parameter-Masked Mock Data Challenge for Beyond-Two-Point Galaxy Clustering statistics" Krause et al (including Cuesta-Lazaro) arXiv:2409.10609]

 

["Full-shape analysis with simulation-based priors: cosmological parameters and the structure growth anomaly" Ivanov et al (including Cuesta-Lazaro) arXiv:2405.02252]

 

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

Hybrid Simulators

Match outputs from higher resolution simulations

["A Cosmic-Scale Benchmark for Symmetry-Preserving Data Processing" 
Balla, Mishra-Sharma, Cuesta-Lazaro et al 
LOG 2024 & NeurReps at NeurIPs 2024]

 

Foundation models?

Symmetry preserving architectures?

High resolution simulations too expensive for large training sets

New physics or pesky baryons?

The complex physics of galaxy formation can mimic signals we expect from new fundamental physics, making these effects difficult to disentangle 

 

\frac{\partial \mathbf{u}}{\partial t} = \mathcal{H}[\mathbf{x},\mathbf{R}_{\theta}(\mathbf{x})]

Hydro simulator

Subgrid model

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

Artificial General Intelligence?

Pre 2022

Post 2022

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

Meaningful benchmarks for  AIScientists will lead to better AIScientists

LLMs have saturated current benchmarks

Six months later....

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

1. Hypothesis Generation

2. Implementing

3. Testing

4. Data analysis and solving inverse problems

Self interacting dark radiation

Early Dark Energy

Modifying the Boltzmann equation in CLASS

Running and analysing CMB simulations to test implementation

Manipulating data and obtaining posteriors

Evaluating LLMs for Cosmology

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

Simulations

Observations

Guided by observational constraints

Robust Inference

Anomaly Detection for new physics searches

Baryonic feedback

Hybrid simulators

Can LLMs close the loop?

Theory

Observations

Reconstructing latent features:

Dark matter, ICs...

Generative Models:

Beyond Simulation Emulation

Part 1

Part 2

Part 3

Future Directions

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

Extra Slides

 

Small

Large

\langle\mathrm{True}\,\,\mathrm{Pred}\rangle

In-Distribution

In-Distribution

In-Distribution

Out-of-Distribution

Out-of-Distribution

Out-of-Distribution

Out-of-Distribution

Out-of-Distribution

Out-of-Distribution

Carolina Cuesta-Lazaro IAIFI/MIT @ X 2025

Generative SDE

dX_t = b_t(X_t, x_0) dt + \sigma_t dW_t

3D U-Net

Carolina Cuesta-Lazaro IAIFI/MIT @ X 2025

z~1000

z~10

z~5

z~1

Primordial Non-Gaussianity

Dark Energy

Early-Late Tensions

Carolina Cuesta-Lazaro IAIFI/MIT @ X 2025

Adversarial SBI

f

Summarizer (NN)

\mathcal{L} = p_\phi(\theta|f(x))

Optimal summary statistic to constrain:

\color{darkgray}{\Omega_m}, \color{darkgray}{w_0, w_a},\color{darkgray}{f_\mathrm{NL}}\, ...

Dark matter

Dark energy

Inflation

x
\theta
f(x)

Optimal summary statistic for a Gaussian Random Field

\mathrm{R} \sim \mathrm{Observations}
\mathrm{F} \sim \mathrm{Simulations}

Real or Fake?

\lambda > 0

Summarizer (NN)

Summaries informative of parameters

Anomalous SBI

highlight differences

Adversarial loss

\lambda < 0

Robust SBI

remove discrepancies

\mathcal{L} = p(\theta|f(x_\mathrm{sim})) + \lambda \mathcal{L}_\text{classifier}(f(x_\mathrm{obs})),f(x_\mathrm{sim})))

Classifier:

Adversarial SBI

Carolina Cuesta-Lazaro IAIFI/MIT @ X 2025

A Toy Model

Scale-dependent noise + baryonic effects

Simulations:

Robust Summaries

Anomalous Summaries

Noiseless Power Spectrum

Observations:

Carolina Cuesta-Lazaro IAIFI/MIT @ X 2025

UTSA-2025

By carol cuesta

UTSA-2025

  • 110