Beyond the Observable
IAIFI Fellow, MIT

Carolina Cuesta-Lazaro
Art: "Drawing Hands" by M.C. Escher
A Machine Learning perspective on modern Cosmology





1-Dimensional



Machine Learning
Secondary anisotropies
Galaxy formation
Intrinsic alignments



DESI / SphereX / Hetdex
Euclid / LSST
SO / CMB-S4
Ligo / Einstein


The era of Big Data Cosmology
xAstrophysics
5-Dimensional
HERA / CHIME
SAGA / MANGA




Galaxy formation
Emitters Census
Reionization


Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025
Cosmic Microwave Background
Galaxies / Dwarfs
21 cm
Galaxy Surveys
Gravitational Lensing
Gravitational Waves
We have increasingly precise observations of the cosmos, but our biggest questions remain about what we can't directly observe


Dark Matter
Gas
Bullet Cluster
["DESI 2024 VI: Cosmological Constraints from the Measurements of Baryon Acoustic Oscillations" arXiv:2404.03002]

Dark Energy is constant over time
Inflation
x5 times more collisionless matter than we can see
Dark Matter
Exponential expansion in the very early universe
Expansion is accelerating
Dark Energy
Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

GANS

Deep Belief Networks
2006

VAEs

Normalising Flows

BigGAN

Diffusion Models

2014
2017
2019
2022
A folk music band of anthropomorphic autumn leaves playing bluegrass instruments
CLIP
2023
Meanwhile, on Earth...
Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025
["Genie 2: A large-scale foundation model" Parker-Holder et al]

["Generative AI for designing and validating easily synthesizable and structurally novel antibiotics" Swanson et al]
Probabilistic ML has made high dimensional inference tractable
1024x1024xTime
Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025
Prompting simulators
Running the clock backwards

Probabilistic Debiasing

Field-level inference for galaxy surveys
Fast Emulators + Likelihood Models

Uncertainty Quantification
High-dimensional Inference enables new science in Cosmology
Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

Dataset Size = 1
Can't poke it in the lab

Simulations
Bayesian statistics
But inference in Cosmology is hard
Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025
Simulations: Theory and Testing Ground
Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025
Generative Models 101
Maximize the likelihood of the training samples
Parametric Model


Training Samples
Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025
Trained Model

Evaluate probabilities


Low Probability
High Probability

Generate Novel Samples


Simulator
Simulator
Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025










Astrophysics proliferates Simulation-based Inference
on Simulations
Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025
[Video credit: Francisco Villaescusa-Navarro]
Gas density
Gas temperature
Subgrid model 1
Subgrid model 2
Subgrid model 3
Subgrid model 4
New physics or pesky baryons?
Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

Simulations
Observations
Guided by observational constraints
Robust Inference
Reconstructing latent features:
Dark matter, ICs...
Generative Models:
Beyond Simulation Emulation
A Two-Way Road
Anomaly Detection for new physics searches
Baryonic feedback
Hybrid simulators
Part 1
Part 2
Part 3
Future Directions
Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

1. Extracting information from Galaxy Clustering
[Image Credit: Claire Lamman (CfA/Harvard) / DESI Collaboration]
Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025
["DESI 2024 VI: Cosmological Constraints from the Measurements of Baryon Acoustic Oscillations" arXiv:2404.03002]

DESI's Dark Energy constraints
10 Million Galaxies -> 10's data points
Very lossy compression:

Pair separation
Counts
Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025
Forward Model
Observable
Predict
Infer
Parameters
Inverse mapping
Perturbation Theory
Pen and paper

Simulations

+ MCMC hammer

Dark matter
Dark energy
Inflation
Initial conditions


["Cosmological constraints from density-split clustering in the BOSS CMASS galaxy sample" Paillas, Cuesta-Lazaro et al arXiv:2309.16541 MNRAS]
["Cosmological constraints from the Minkowski functionals of the BOSS CMASS galaxy sample" Liu, Paillas, Cuesta-Lazaro et al arXiv:2501.01698]
["SUNBIRD: A simulation-based model for full-shape density-split clustering" Cuesta-Lazaro et al arXiv:2309.16539 MNRAS]
Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025




Reverse diffusion: Denoise previous step
Forward diffusion: Add Gaussian noise (fixed)
Prompt: A person half Yoda half Gandalf
Diffusion model
Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

["A point cloud approach to generative modeling for galaxy surveys at the field level"
Cuesta-Lazaro and Mishra-Sharma
ICML AI4Astro 2023 (Spotlight talk), arXiv:2311.17141]
Base Distribution
Target Distribution
- Sample
- Evaluate
Simulated Galaxy 3d Map
Prompt:
Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

Fixed Initial Conditions / Varying Cosmology
Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025





Mean relative velocity
k Nearest neighbours

Pair separation
Pair separation
Reproducing Summary Statistics
Varying cosmological parameters
["How DREAMS are made: Emulating Satellite Galaxy and Subhalo Populations with Diffusion Models and Point Clouds" Nguyen et al (including Cuesta-Lazaro) arXiv:2409.02980]
Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025
Diffusion model
With only 5000 galaxies!

Diffusion
Pair Counting
Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

CNN
Diffusion
Increasing Noise
["Diffusion-HMC: Parameter Inference with Diffusion Model driven Hamiltonian Monte Carlo" Mudur, Cuesta-Lazaro and Finkbeiner NeurIPs 2023 ML for the physical sciences, arXiv:2405.05255]

Nayantara Mudur


CNN
Diffusion
Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

1 to Many:
Galaxies
Dark Matter

["Debiasing with Diffusion: Probabilistic reconstruction of Dark Matter fields from galaxies"
Ono et al (including Cuesta-Lazaro)
NeurIPs 2023 ML for the physical Sciences / APJ
arXiv:2403.10648]

Victoria Ono
Core F. Park

2. Probabilistic Reconstruction
Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

Truth
Sampled


Observed

Small
Large
Scale (k)
Power Spectrum
Small
Large
Scale (k)
Cross correlation
Sample CDM
Truth CDM
Galaxies
Galaxies Sample CDM
Truth CDM
Galaxies
Galaxies Truth CDM
Truth CDM Sample CDM
Sample CDM
Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025
Galaxies
True DM
Sample DM




Size of training simulation
1) Generalising to larger volumes
Model trained on Astrid subgrid model
2) Generalising to subgrid models
["Debiasing with Diffusion: Probabilistic reconstruction of Dark Matter fields from galaxies"
Ono et al (including Cuesta-Lazaro)
NeurIPs 2023 ML for the physical Sciences / APJ / arXiv:2403.10648]



["3D Reconstruction of Dark Matter Fields with Diffusion Models: Towards Application to Galaxy Surveys" Park, Mudur, Cuesta-Lazaro et al ICML 2024 AI for Science]
Posterior Sample
Posterior Mean
Debiasing Cosmic Flows
Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025
Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025


Initial Conditions
early Universe
Cosmological Parameters
theory
?

Observed Density Field
today




Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025
Hamiltonian Monte Carlo
1) Likelihood is intractable for realistic scenarios, but can get samples from simulator
2) Forward model has to be differentiable
(and relatively fast)
3) Not amortized
["Bayesian physical reconstruction of initial conditions from large scale structure surveys" Jasche, Wandelt arXiv:1203.3639]
Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

True
Reconstructed

(Marginalizing over parameters)
["Joint cosmological parameter inference and initial condition reconstruction with Stochastic Interpolants" Cuesta-Lazaro, Bayer, Albergo et al NeurIPs ML4PS 2024 Spotlight talk]
Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

Initial Conditions
Finals
Small
Large
Scale (k)
Cross correlation
Perfect Reconstruction
(impossible due to information loss)
Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025
Generative Model
NF
(Marginalizing over parameters)
1) Likelihood not necessarily Gaussian
2) Forward model no need differentiable
3) Amortized
["Full-shape analysis with simulation-based priors: Constraints on single field inflation from BOSS" Ivanov, Cuesta-Lazaro et al arXiv:2405.02252 Physical Review D]

Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

Simulations
Observations
Guided by observational constraints
Robust Inference
Anomaly Detection for new physics searches
Baryonic feedback
Hybrid simulators

Can LLMs close the loop?
Theory
Observations
Reconstructing latent features:
Dark matter, ICs...
Generative Models:
Beyond Simulation Emulation
Part 1
Part 2
Part 3
Future Directions
Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

Late Universe
Early Universe
Tension
2025
Increasing significance:
Robust error bars against distribution shifts
The missing pieces: Searching for what we don't know to look for
STRESS TESTING LCDM
From Tensions to Discoveries: Anomalies in Cosmology

["A Parameter-Masked Mock Data Challenge for Beyond-Two-Point Galaxy Clustering statistics" Krause et al (including Cuesta-Lazaro) arXiv:2409.10609]
["Full-shape analysis with simulation-based priors: cosmological parameters and the structure growth anomaly" Ivanov et al (including Cuesta-Lazaro) arXiv:2405.02252]
Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025
Hybrid Simulators
Match outputs from higher resolution simulations
["A Cosmic-Scale Benchmark for Symmetry-Preserving Data Processing"
Balla, Mishra-Sharma, Cuesta-Lazaro et al
LOG 2024 & NeurReps at NeurIPs 2024]
Foundation models?
Symmetry preserving architectures?
High resolution simulations too expensive for large training sets
New physics or pesky baryons?
The complex physics of galaxy formation can mimic signals we expect from new fundamental physics, making these effects difficult to disentangle
Hydro simulator
Subgrid model
Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025

Artificial General Intelligence?
Pre 2022
Post 2022


Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025


Meaningful benchmarks for AIScientists will lead to better AIScientists
LLMs have saturated current benchmarks
Six months later....
Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025
1. Hypothesis Generation
2. Implementing
3. Testing
4. Data analysis and solving inverse problems
Self interacting dark radiation
Early Dark Energy
Modifying the Boltzmann equation in CLASS
Running and analysing CMB simulations to test implementation
Manipulating data and obtaining posteriors
Evaluating LLMs for Cosmology


Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025
Simulations
Observations
Guided by observational constraints
Robust Inference
Anomaly Detection for new physics searches
Baryonic feedback
Hybrid simulators

Can LLMs close the loop?
Theory
Observations
Reconstructing latent features:
Dark matter, ICs...
Generative Models:
Beyond Simulation Emulation
Part 1
Part 2
Part 3
Future Directions
Carolina Cuesta-Lazaro IAIFI/MIT @ UTSA 2025
Extra Slides

Small
Large
In-Distribution
In-Distribution
In-Distribution
Out-of-Distribution
Out-of-Distribution
Out-of-Distribution
Out-of-Distribution
Out-of-Distribution
Out-of-Distribution
Carolina Cuesta-Lazaro IAIFI/MIT @ X 2025
Generative SDE
3D U-Net
Carolina Cuesta-Lazaro IAIFI/MIT @ X 2025

z~1000
z~10




z~5
z~1
Primordial Non-Gaussianity

Dark Energy




Early-Late Tensions

Carolina Cuesta-Lazaro IAIFI/MIT @ X 2025
Adversarial SBI


Summarizer (NN)


Optimal summary statistic to constrain:
Dark matter
Dark energy
Inflation
Optimal summary statistic for a Gaussian Random Field

Real or Fake?



Summarizer (NN)
Summaries informative of parameters
Anomalous SBI
highlight differences
Adversarial loss
Robust SBI
remove discrepancies
Classifier:
Adversarial SBI
Carolina Cuesta-Lazaro IAIFI/MIT @ X 2025


A Toy Model
Scale-dependent noise + baryonic effects
Simulations:

Robust Summaries

Anomalous Summaries
Noiseless Power Spectrum
Observations:
Carolina Cuesta-Lazaro IAIFI/MIT @ X 2025
UTSA-2025
By carol cuesta
UTSA-2025
- 110