Generative Solutions for Cosmic Problems
Flatiron Institute
Institute for Advanced Studies
Carol(ina) Cuesta-Lazaro

What is field-level inference?
A digital twin of our Universe

Observed Galaxy Distribution
Simulated Galaxy Distribution

Field Level Inference
Forward Model
(= no Cosmic Variance)




Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026
Why field-level inference?
Optimal constraints
N-point functions
Counts-in-cell
Wavelets
Marked tpcfs
Voids
Do we really need to infer the ICs?

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026
["Simulation-Based Emulators for Galaxy Clustering in the Era of Stage-IV Surveys: I. Two-Point Statistics and Beyond" Paillas et al (include CCL) 2026]

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026
Marginal Inference - SBI


Neural Compression
Initial Conditions
Marginal Likelihood
Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026
["Diffusion-HMC: Parameter Inference with Diffusion Model driven Hamiltonian Monte Carlo" Mudur, Cuesta-Lazaro and Finkbeiner NeurIPs 2023 ML for the physical sciences, arXiv:2405.05255]






Reconstructing ALL latent variables:
Dark Matter distribution
Entire formation history
Peculiar velocities
Predictive Cross Validation:
Cross-Correlation with other probes without Cosmic Variance

[Image Credit: Yuuki Omori]
Constraining Inflation:
Inferring primordial non-gaussianity
Why field-level inference?
Data-driven Subgrid models / Data-driven Systematics
Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026
The forward model
Scaling up to survey volumes
Modelling small scale clustering
Survey realism
Model mispecification
Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026
The FLI Challenges
Sampling high-dimensional posteriors
The Forward Model
Galaxy Formation

Adapted from arXiv:1804.03097
Symmetries
Connected to Underlying Physics
Hydro sims
Empirical
Halo Occupation Distribution (HOD)
EFT bias expansion
Matter Density
Galaxy Distribution
Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026
(Slide credit: Matthew Ho)Scaling Up to Survey Volumes

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026
Learning The Universe
Simon Ding
Xiaosheng Zhao
Lucas Makinen
Axel Lapel
Adrian Bayer
Guilhem Lavaux
Benjamin Wandelt
Ce Sui
Matthew Ho
Leander Thiele
Rosa Malandrino
Greg Bryan
Nicolas Chartier
Lucia Perez
Chirag Modi
Deaglan Bartlett
Shivam Pandey
Sammy Sharief
Ana Maria Delgado
Anirban Bairagi
Christopher Lovell
Carolina Cuesta-Lazaro
Shy Genel
Francisco Villaescusa-Navarro
Laurence Perreault Levasseur
...

Particle Mesh for Gravity

Gas Properties
Density
Temperature

Galaxy Distribution
[See Amanda's talk]
["BaryonBridge: Interpolants models for fast hydrodynamical simulations" Horowitz, Cuesta-Lazaro, Yehia ML4Astro workshop 2025]Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026
Scalable Field Level Emulators
Probabilistic
Local
["CHARM: Creating Halos with Auto-Regressive Multi-stage networks" Pandey et al 2024]
Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

(Slide credit: Matthew Ho)
Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026
(Slide credit: Matthew Ho)

Posterior resimulations in minutes!
- Gravity Solver (Gadget-4)
- Halo finder (SUBFIND)
- Semi-analytic galaxy formation model (L-Galaxies)
OOD Tests
Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026



Sampling High Dimensional Posteriors

GANS

Deep Belief Networks
2006

VAEs

Normalising Flows

BigGAN

Diffusion Models

2014
2017
2019
2022
A folk music band of anthropomorphic autumn leaves playing bluegrass instruments
Contrastive Learning
2023
Meanwhile, on Earth...
2026
"Write a C compiler"
AGI?
Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026
["Genie 2: A large-scale foundation model" Parker-Holder et al (2024)]
Probabilistic ML has made high dimensional inference tractable
1024x1024xTime
["Genie 3: A new frontier for world models" Parker-Holder et al (2025)]
Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026
Goal: Estimate unknown p(x1) from samples
Base
Target
Transport Map


Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

Base
Data
"Creating noise from data is easy; creating data from noise is generative modeling."
(Yang Song)
Neural Network





Transport Map
Continuity Equation
Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026
Interpolant
Base
Data
Neural Network
1) Training
2) Inference
Estimated from samples
(Implicit Likelihood)
Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026
["Stochastic Interpolants: A Unifying framework for flows and diffusion" Albergo et al arXiv:2303.08797]

Particle Mesh
Dark Matter Only
Gaussian Likelihood
Explicit Sampling vs SBI
Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026
["Joint cosmological parameter inference and initial condition reconstruction with Stochastic Interpolants" Cuesta-Lazaro, Bayer, Albergo et al NeurIPs 2024 ML for the Physical Sciences]

Adrian Bayer
Mount Fuji?
Chirag Modi

1) Likelihood not necessarily Gaussian
2) Forward model no need differentiable
3) Amortized
Generative Model: Marginalizing over ICs
Generative Model: Fixing ICs
HMC: Marginalizing over ICs
Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

True
Reconstructed

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

SBI
HMC
Cross Correlation Coefficient
Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026
Scaling up in Volume
Large Scale Reconstruction

True

Reconstructed


Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026
["Reconstructing Cosmological Initial Conditions from Late-Time Structure with Convolutional Neural Networks"
Shallue, Eisenstein 2022]["Initial conditions from galaxies: machine-learning subgrid correction to standard reconstruction"
Parker, Bayer, Seljak 2025]

Power Spectrum
Cross Correlation

Peculiar Velocities
True
Reconstructed
Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026
Scaling up in volume
Implicit FLI for DESI
DESI Y1 LRG Effective volumes already larger than our sims!
Small Scale Galaxy Bias

Selection
Fibre collisions
Forward Modelling the Survey Systematics



EFT
Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026
Galaxy Formation

Adapted from arXiv:1804.03097
Symmetries
Connected to Underlying Physics
Hydro sims
Empirical
Halo Occupation Distribution (HOD)
EFT bias expansion
Matter Density
Galaxy Distribution
Self-Consistent Predictions across observables
Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026
[Video credit: Francisco Villaescusa-Navarro]
Gas density
Gas temperature
Subgrid model 1
Subgrid model 2
Subgrid model 3
Subgrid model 4
Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026
Can we learn a general and continuous representation of Baryonic feedback?

Gas
Galaxies




Dark Matter
Baryonic fields
Marginalize over a broader set of subgrid physics
Interpolate between simulators
Mingshau Liu
(Ming)

Constrain z via multi-wavelength observations
Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026
["Continuous Representations of Baryonic Feedback for Robust Inference from Multiple Simulation Suites"
Liu, Cuesta-Lazaro
NeurIPs ML4PS 2025]
Trained on:
TNG, SIMBA, Astrid, EAGLE
Encoder



1) Encoder

Gas
Galaxies




Dark Matter
Baryonic fields
2) Probabilistic Decoder
Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026



Dark Matter
Baryonic fields
(Test suite)
Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

Gas Density
Temperature
Astrid
EAGLE
Interpolating over Simulations
Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026
Generalizing to unseen simulations: Magneticum



Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026
BEFORE
Artificial General Intelligence?
AFTER


DiscoverPhysics: Benchmarking LLMs for
Out-of-the-Box Scientific Thinking

Matt Wiemann

Lindsay Smith
Hypothesis
Simulate World

Invisible particles
Extra dimensions
Multi Species ...
Simulate World

Invisible particles
Extra dimensions
Multi Species ...
Propose Experiment


Simulate
Text: Conceptual Understanding
Trajectories (.csv)

Science Agent (LLM)

Science Agent (LLM)

Outputs
Python Code: Trajectory MSE
["DiscoverPhysics: Benchmarking LLMs for Out-of-the-Box Scientific Thinking" Wiemann, Smith et al (including CCL)]
You are an expert physicist tasked with discovering scientific laws in a simulated universe. Your goal is to propose experiments, analyse the data they return, and ultimately deduce the underlying scientific law. Note that the laws of physics in this universe may differ from those in our own. You can perform experiments to gather data but must follow the protocol strictly.

If F = G/r²: G = 80. If F = G/r: G = 8. Linear in 1/r!
m=1: a = 0.664 · m=2: a = 0.614 · m=4: a = 0.664 · m=8: a = 0.714 … That's strange."
Consistent pattern: small +y 'extra' force (~0.05) regardless of position … maybe the anchor itself moves.
a_y_extra ≈ +0.05 everywhere on probes. Spot on! So anchor too just experiences g_y = 0.05 … g = (0, 0.05) is a uniform external acceleration on every particle.

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026
You are an expert physicist tasked with discovering scientific laws in a simulated universe. Your goal is to propose experiments, analyse the data they return, and ultimately deduce the underlying scientific law. Note that the laws of physics in this universe may differ from those in our own. You can perform experiments to gather data but must follow the protocol strictly.
Confirmed: a ∝ 1/r.
observed −5.18. WAY too small. Something is very wrong with my understanding!
artifact of integration accuracy, NOT the true law. Probably because of close encounters causing chaos.
I'll just submit my law. The functional form is solid.


Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026


Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026



Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026
Conceptual Understanding
Trajectory Prediction
Random Seeds
You are an expert physicist tasked with discovering scientific laws in a simulated universe. Your goal is to propose experiments, analyse the data they return, and ultimately deduce the underlying scientific law. Note that the laws of physics in this universe may differ from those in our own. You can perform experiments to gather data but must follow the protocol strictly.
a × r seems roughly constant … not quite 1/r.
Let me try a × r²: … Not consistent.
So at large r, a ~ 1/r. At small r, a ~ 1/r².
Actually note the file path in the error message: 'extra_dimensions.csv'! This is a hint! The world might have an extra compactified dimension…


[Model starts fitting Yukawa potential, Bessel functions... At some point fitting tool errors.]
Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026










Astrophysics proliferates in Simulation-based Inference
on Simulations
Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026
Simulated Data
Observed Data
Alignment Loss
Reconstruction
Statistical Alignment
(OT / Adversarial)


Encoder
Obs
Encoder
Sims
Private Domain Information
Shared Information


Observed Reconstructed
Simulated Reconstructed
Shared Decoder
Shared Decoder
Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

A Toy Model Example


Idealized Simulations
Observations
+ Scale Dependent Noise
+ Bump
Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

Amplitude
Tilt
Tilt
Robust SBI from Shared

Visualizing Information Split
Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026
Perimeter-2026
By carol cuesta
Perimeter-2026
- 19