Beyond the Observable

IAIFI Fellow, MIT

Carolina Cuesta-Lazaro

Art: "Drawing Hands" by M.C. Escher

A Machine Learning perspective on modern Cosmology

1-Dimensional

Machine Learning

Secondary anisotropies

Galaxy formation

Intrinsic alignments

DESI / SphereX / Hetdex

Euclid / LSST

SO / CMB-S4

Ligo / Einstein

The era of Big Data Cosmology

xAstrophysics

HERA / CHIME

SAGA / MANGA

Galaxy formation

Emitters Census

Reionization

Cosmic Microwave Background

Galaxies / Dwarfs

21 cm

Galaxy Surveys

Gravitational Lensing

Gravitational Waves

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

AGN Feedback/Supernovae

We have increasingly precise observations of the cosmos, but our biggest questions remain about what we can't directly observe

Dark Matter

Gas

Bullet Cluster
\Lambda \mathrm{CDM}
["DESI 2024 VI: Cosmological Constraints from the Measurements of Baryon Acoustic Oscillations" arXiv:2404.03002]

Dark Energy is constant over time

Inflation

x5 times more collisionless matter than we can see

Dark Matter

Exponential expansion in the very early universe

Expansion is accelerating,

dynamical?

Dark Energy

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

Why Now?

Beyond tools 

Optimisation

Neural representations

Baryonification

Inflation

Symmetry-preserving ML

Early Universe - JWST

Simulation Based Inference

Epidemiological simulations

Medical Imaging

Natural Language Processing

Exoplanets

Compute

Simulations

Data

ML

Statistics

Physics

What is dark matter made of?

What is driving the accelerated expansion?

How did the Universe begin?

A new way of thinking

 about

physical systems

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

Prompting simulators

Running the clock backwards

3

Probabilistic Debiasing

2

Field-level inference for galaxy surveys

1

Fast Emulators + Likelihood Models

Uncertainty Quantification

p(\mathrm{Dark Energy}|\mathrm{Galaxies})
p(\mathrm{Dark Matter}|\mathrm{Galaxies})
p(\mathrm{ICs}|\mathrm{Today})

Machine Learning enables new science in Cosmology

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

GANS

Deep Belief Networks

2006

VAEs

Normalising Flows

BigGAN

Diffusion Models

2014

2017

2019

2022

A folk music band of anthropomorphic autumn leaves playing bluegrass instruments

Contrastive Learning

2023

Meanwhile, on Earth...

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

p(\mathrm{World}|\mathrm{Prompt})
["Genie 2: A large-scale foundation model" Parker-Holder et al]
p(\mathrm{Drug}|\mathrm{Properties})
["Generative AI for designing and validating easily synthesizable and structurally novel antibiotics" Swanson et al]

Probabilistic ML has made high dimensional inference tractable

1024x1024xTime

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

Dataset Size = 1 

Can't poke it in the lab 

Simulations

Bayesian statistics

But inference in Cosmology is hard

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

Simulations: Testing Ground and Theoretical Models

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

Astrophysics proliferates Simulation-based Inference

on Simulations

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

[Video credit: Francisco Villaescusa-Navarro]

Gas density

Gas temperature

Subgrid model 1

Subgrid model 2

Subgrid model 3

Subgrid model 4

New physics or pesky baryons?

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

We need to understand the baryons!

Simulations

Observations

Guided by observational constraints

Robust Inference

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

A Two-Way Road

Generative Models:

Beyond Simulation Emulation

Part 1

What is driving the accelerated expansion?

Reconstructing latent features:

Dark matter, ICs...

Part 2

How did the Universe begin?

What is dark matter made of?

Anomaly Detection for new physics searches

Baryonic feedback

Hybrid simulators

Part 3

Future Directions

Breaking LCDM

Predictive hydro sims

1. Extracting information from Galaxy Clustering

[Image Credit: Claire Lamman (CfA/Harvard) / DESI Collaboration]

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

\theta

Forward Model

Observable

x

Predict

Infer

Theory Parameters

Inverse mapping

p(\mathcal{\theta}|x)

+ MCMC hammer

\color{darkgray}{\Omega_m}, \color{darkgray}{w_0, w_a},\color{darkgray}{f_\mathrm{NL}}\, ...

Dark matter

Dark energy

Inflation

Initial conditions

+
["Cosmological constraints from density-split clustering in the BOSS CMASS galaxy sample" Paillas, Cuesta-Lazaro
 et al arXiv:2309.16541 MNRAS]

 

["Cosmological constraints from the Minkowski functionals of the BOSS CMASS galaxy sample" Liu, Paillas, Cuesta-Lazaro
et al arXiv:2501.01698]

 

["SUNBIRD: A simulation-based model for full-shape density-split clustering" Cuesta-Lazaro et al arXiv:2309.16539 MNRAS]

 

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

\mathcal{O}(10)
\mathcal{O}(10M)

Carol's optimistic forecast

Generative Models 101

Maximize the likelihood of the training samples

\hat \phi = \argmax \left[ \log p_\phi (x_\mathrm{train}) \right]
x_1
x_2

Parametric Model

p_\phi(x)

Training Samples

x_\mathrm{train}

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

x_1
x_2

Trained Model

p_\phi(x)

Evaluate probabilities

Low Probability

High Probability

Generate Novel Samples

Simulator

Generative Model

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

Fast emulators

Testing Theories

Generative Model

Simulator

Generative Models: Simulate and Analyze

p(z_0)
p(z_T)
p(z_2)
p(z_1)

Reverse diffusion: Denoise previous step

Forward diffusion: Add Gaussian noise (fixed)

Prompt: A person half Yoda half Gandalf

Diffusion model

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

["A point cloud approach to generative modeling for galaxy surveys at the field level"

Cuesta-Lazaro and Mishra-Sharma
International Conference on Machine Learning ICML AI4Astro 2023, Spotlight talk, arXiv:2311.17141]

Base Distribution

Target Distribution

Simulated Galaxy 3d Map

Prompt:

\Omega_m, \sigma_8

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

Prompt: A person half Yoda half Gandalf

Physics as a basis for use-inspired methods development

Fixed Initial Conditions / Varying Cosmology

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

Mean relative velocity

k Nearest neighbours

Pair separation

Pair separation

Reproducing Summary Statistics

Varying cosmological parameters

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

Physics as a testing ground: Well-understood summary statistics enable rigorous validation of generative models

p(\theta|x) = \frac{p(x|\theta)p(\theta)}{p(x)}

Diffusion model

With only O(10^4) galaxies!

Diffusion

Pair Counting

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

Initial Clumpinesss

Matter Energy Density

CNN

Diffusion

Increasing Noise

p(\sigma_8|\delta_m)
p(\sigma_8|\delta_m + 0.01 \epsilon)
p(\sigma_8|\delta_m + 0.02 \epsilon)
["Diffusion-HMC: Parameter Inference with Diffusion Model driven Hamiltonian Monte Carlo" 
Mudur, Cuesta-Lazaro and Finkbeiner
NeurIPs 2023 ML for the physical sciences, arXiv:2405.05255]


Nayantara Mudur

CNN

Diffusion

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

Generative model constraints are more robust

6 seconds / sim  vs 40 million CPU hours

Fast Emulation:

Parameter constraints:

Generative Models: Simulate and Analyze

Diffusion

Pair Counting

Carol's optimistic forecast

\mathcal{O}(10^{4-7})
\mathcal{O}(10)

High dimensional inference

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

Alternative Clustering Methods

p_\phi(\rho_\mathrm{DM}|\rho_\mathrm{Galaxies})

1 to Many:

Galaxies (Observable)

Dark Matter (Unobservable)

["Debiasing with Diffusion: Probabilistic reconstruction of Dark Matter fields from galaxies" 
Ono et al (including Cuesta-Lazaro) 
NeurIPs 2023 ML for the physical Sciences / APJ
arXiv:2403.10648]

 

Victoria Ono

Core F. Park

Probabilistic Reconstruction

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

Truth

Sampled

Observed

Small

Large

Scale (k)

Power Spectrum

Small

Large

Scale (k)

Cross correlation

Galaxies Sample CDM

Truth CDM

Galaxies

Galaxies Truth CDM

 Truth CDM Sample CDM

Sample CDM

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

CAMELS

DESI LRG ~ 20 (Gpc/h)^3

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

Galaxies

True DM

Sample DM

Size of training simulation

1) Generalising to larger volumes

Model trained on Astrid subgrid model, tested on TNG

2) Generalising across galaxy formation models

["Debiasing with Diffusion: Probabilistic reconstruction of Dark Matter fields from galaxies" 
Ono et al (including Cuesta-Lazaro) 
NeurIPs 2023 ML for the physical Sciences / APJ / arXiv:2403.10648]

 

Void

Galaxy Cluster

["3D Reconstruction of Dark Matter Fields with Diffusion Models: Towards Application to Galaxy Surveys" 
Park, Mudur, Cuesta-Lazaro et al 
International Conference on Machine Learning ICML 2024 AI for Science]

 

Posterior Sample

Posterior Mean

What's next?

Application to Galaxy Surveys

Robust to galaxy formation uncertainties

Probabilistic Reconstruction of Dark Matter

Inverse

Sampling over

20M dimensions

Each sample would cost 6k CPU hours

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

Observed Light

Dark Matter

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

Running the clock backwards

Initial Conditions

early Universe

Cosmological Parameters

theory

\mathit{O}(2048^3)
p(\delta_{\mathrm{ICs}}, \mathcal{\theta}|\delta_{\mathrm{Obs}})

?

Observed Density Field

today

\mathit{O}(10)
\mathit{O}(2048^3)
+
\Omega_m,
\sigma_8

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

True

Reconstructed

\delta_\mathrm{Obs}
\delta_\mathrm{ICs}
["Joint cosmological parameter inference and initial condition reconstruction with Stochastic Interpolants
Cuesta-Lazaro, Bayer, Albergo et al 
NeurIPs ML4PS 2024 Spotlight talk]

 

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

Generative Model

NF

p(\delta_\mathrm{ICs}, \theta|\delta_\mathrm{Obs}) =
p(\delta_\mathrm{ICs}|\delta_\mathrm{Obs})
p(\theta|\delta_\mathrm{ICs},\delta_\mathrm{Obs})

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

Sampling over                                             jointly with theory parameters

+

Constrained Simulations for Galaxy Surveys

100M dimensions

Reconstructing ALL latent variables:

Dark Matter distribution

Entire formation history

Peculiar velocities

Interpretability:

Cross-Correlation with other probes

[Image Credit: Yuuki Omori]

 

Constraining Inflation

Inferring primordial non-gaussianity

Simulations

Observations

Guided by observational constraints

Robust Inference

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

Generative Models:

Beyond Simulation Emulation

Part 1

What is driving the accelerated expansion?

Reconstructing latent features:

Dark matter, ICs...

Part 2

How did the Universe begin?

What is dark matter made of?

Anomaly Detection for new physics searches

Baryonic feedback

Hybrid simulators

Part 3

Future Directions

Breaking LCDM

Predictive hydro sims

Late Universe

Early Universe

Tension

2025

From Tensions to Discoveries:  Anomalies in Cosmology

["Full-shape analysis with simulation-based priors: cosmological parameters and the structure growth anomaly" Ivanov, Obuljen, Cuesta-Lazaro, Toomey  arXiv:2409.10609]

 

Carol's optimistic forecast

Early vs Late

Parametric Extensions

Challenge: Robust error bars  against distribution shifts

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

Clumpinesss

BOSS (Late Universe)

BOSS (Late Universe) + ML

BOSS (Late Universe) + Bk + ML

Early Universe

Tension

\approx 5 \sigma
[Image Credit: Prof. Wendy Freedman]

 

["Full-shape analysis with simulation-based priors: cosmological parameters and the structure growth anomaly" Ivanov, Obuljen, Cuesta-Lazaro, Toomey  arXiv:2409.10609]

 

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

["A Parameter-Masked Mock Data Challenge for Beyond-Two-Point Galaxy Clustering statistics" Krause et al (including Cuesta-Lazaro) arXiv:arXiv:2405.02252]

 

["Neural Posterior Estimation with Adversarial Regularization: From Robust Statistics to Missing Physics" Cuesta-Lazaro et al in-prep]

 

Looking for what we don't know to look for

The missing pieces: Beyond parametric searches

Axion Dark Matter

Dark Matter - Baryon Interactions

Primordial Non-Gaussianity

Early Dark Energy

Dark Radiation

[Credit: Sandbox Studio]

 

[Credit: Sandbox Studio]

 

What makes an anomaly in Astrophysics interesting?

A representation learning problem:

where meaningful anomalies become apparent

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

Text-enhanced representations of X-Ray data

Rafael Martinez Galarza

["Augmenting X-Ray Astronomical Representations with Scientific Knowledge Through Contrastive Learning" Martinez-Galarza et al (including Cuesta-Lazaro) in-prep]

 

Alex Gagliano

Simulated Energy Sources

Physical parameters

(Temperature, ejecta velocity ...)

Physical Anomalies in Supernovae Lightcurves

Learned Shared Representation

Edgar Vidal

Hybrid Simulators

["A Cosmic-Scale Benchmark for Symmetry-Preserving Data Processing" 
Balla, Mishra-Sharma, Cuesta-Lazaro et al 
LOG 2024 & NeurReps at NeurIPs 2024]

 

Foundation models?

Symmetry preserving architectures?

High resolution simulations too expensive for large training sets

New physics or pesky baryons?

The complex physics of galaxy formation can mimic signals we expect from new fundamental physics, making these effects difficult to disentangle 

 

\frac{\partial \mathbf{u}}{\partial t} = \mathcal{H}[\mathbf{x},\mathbf{R}_{\theta}(\mathbf{x})]

Hydro simulator

Subgrid model

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

Learning the Universe - Simons Collaboration

Simulations

Observations

Guided by observational constraints

Robust Inference

Anomaly Detection for new physics searches

Baryonic feedback

Hybrid simulators

Can LLMs close the loop?

Theory

Observations

Reconstructing latent features:

Dark matter, ICs...

Generative Models:

Beyond Simulation Emulation

Part 1

Part 2

Part 3

Future Directions

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

What is driving the accelerated expansion?

How did the Universe begin?

Breaking LCDM

Predictive hydro sims

What is dark matter made of?

Artificial General Intelligence?

Pre 2022

Post 2022

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

Meaningful benchmarks for  AIScientists will lead to better AIScientists

LLMs have saturated current benchmarks

Six months later....

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

Years since benchmark introduction

Score relative to human performance

Human Performance

1. Hypothesis Generation

2. Implementing beyond LCDM models

4. Solving inverse problems

Self interacting dark radiation

Early Dark Energy

Compute modified Pk with CLASS

Obtaining posteriors and computing metrics for comparison to LCDM

Evaluating LLMs for Scientific Discovery

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

3. Data analysis

Gathering data and postprocessing

Goal: Resolving the Hubble tension

Beyond tools 

Compute

Simulations

Data

ML

Statistics

Physics

Use-inspired AI developments

The future of Astrophysics

A new way of thinking

 about

physical systems

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

Adam Klivans (CS)

Sanjay Shakkottai (CS)

Matthew Lease (iSchool)

Arya Farahi (Data Science)

Qiang Liu (CS)

 

Chris Sneden

Paul Shapiro

Julian Muñoz

Mike Boylan-Kolchin

Karl Gebhardt

Keith Hawkins

Stella Offner

Volker Bromm

George Biros (Oden)

Extra Slides

 

Small

Large

\langle\mathrm{True}\,\,\mathrm{Pred}\rangle

In-Distribution

In-Distribution

In-Distribution

Out-of-Distribution

Out-of-Distribution

Out-of-Distribution

Out-of-Distribution

Out-of-Distribution

Out-of-Distribution

Carolina Cuesta-Lazaro IAIFI/MIT @ X 2025

["Full-shape analysis with simulation-based priors: Constraints on single field inflation from BOSS" Ivanov, Cuesta-Lazaro et al arXiv:2402.13310 Physical Review D]

 

Carolina Cuesta-Lazaro IAIFI/MIT @ UT Austin 2025

Single field inflation

Power Spectrum + Bispectrum

10^{-3} f^{\textrm{equil}}_{NL}
10^{-3} f^{\textrm{ortho}}_{NL}
10^{-3} f^{\textrm{ortho}}_{NL}
10^{-3} f^{\textrm{equil}}_{NL}

Generative SDE

dX_t = b_t(X_t, x_0) dt + \sigma_t dW_t

3D U-Net

Carolina Cuesta-Lazaro IAIFI/MIT @ X 2025

Generative Model

NF

p(\delta_\mathrm{ICs}, \theta|\delta_\mathrm{Obs}) =
p(\delta_\mathrm{ICs}|\delta_\mathrm{Obs})
p(\theta|\delta_\mathrm{ICs},\delta_\mathrm{Obs})
["Full-shape analysis with simulation-based priors: Constraints on single field inflation from BOSS" Ivanov, Cuesta-Lazaro et al arXiv:2405.02252 Physical Review D]

 

\mathrm{SI}:
\mathrm{SI}:
\mathrm{HMC}:

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

Exact Posterior

Generative Model

Generative Model + Known ICs

z~1000

z~10

z~5

z~1

Primordial Non-Gaussianity

Dark Energy

Early-Late Tensions

Carolina Cuesta-Lazaro IAIFI/MIT @ X 2025

Adversarial SBI

f

Summarizer (NN)

\mathcal{L} = p_\phi(\theta|f(x))

Optimal summary statistic to constrain:

\color{darkgray}{\Omega_m}, \color{darkgray}{w_0, w_a},\color{darkgray}{f_\mathrm{NL}}\, ...

Dark matter

Dark energy

Inflation

x
\theta
f(x)

Optimal summary statistic for a Gaussian Random Field

\mathrm{R} \sim \mathrm{Observations}
\mathrm{F} \sim \mathrm{Simulations}

Real or Fake?

\lambda > 0

Summarizer (NN)

Summaries informative of parameters

Anomalous SBI

highlight differences

Adversarial loss

\lambda < 0

Robust SBI

remove discrepancies

\mathcal{L} = p(\theta|f(x_\mathrm{sim})) + \lambda \mathcal{L}_\text{classifier}(f(x_\mathrm{obs})),f(x_\mathrm{sim})))

Classifier:

Adversarial SBI

Carolina Cuesta-Lazaro IAIFI/MIT @ X 2025

A Toy Model

Scale-dependent noise + baryonic effects

Simulations:

Robust Summaries

Anomalous Summaries

Noiseless Power Spectrum

Observations:

Carolina Cuesta-Lazaro IAIFI/MIT @ X 2025

Lyman Alpha Emitters in MTNG

Halo mass high z

UTAustin-2025

By carol cuesta

UTAustin-2025

  • 73