Generative Solutions for Cosmic Problems

Flatiron Institute

Institute for Advanced Studies

Carol(ina) Cuesta-Lazaro

What is field-level inference?

A digital twin of our Universe

Observed Galaxy Distribution

Simulated Galaxy Distribution

Field Level Inference

Forward Model

(= no Cosmic Variance)

+
\Omega_m,
\sigma_8 ...
p(\delta_{\mathrm{ICs}}, \mathcal{\theta}|\delta_{\mathrm{Obs}})

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

Why field-level inference?

Optimal constraints

p(
)
|
\mathrm{Cosmology}

N-point functions

Counts-in-cell

Wavelets

Marked tpcfs

Voids

Do we really need to infer the ICs?

p(
)
|
\mathrm{Cosmology}

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

["Simulation-Based Emulators for Galaxy Clustering in the Era of Stage-IV Surveys: I. Two-Point Statistics and Beyond" Paillas et al (include CCL) 2026]

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

\mathcal{O}(100) \,\, \mathrm{simulations}

Marginal Inference - SBI

p(
)
|
\mathrm{Cosmology}
S_\theta(
)
p(
)
|
\mathrm{Cosmology}

Neural Compression

p(x|\theta) = \int p(x|z, \theta) p(z|\theta) \, dz

Initial Conditions

Marginal Likelihood

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

["Diffusion-HMC: Parameter Inference with Diffusion Model driven Hamiltonian Monte Carlo" 
Mudur, Cuesta-Lazaro and Finkbeiner
NeurIPs 2023 ML for the physical sciences, arXiv:2405.05255]

 

+

Reconstructing ALL latent variables:

Dark Matter distribution

Entire formation history

Peculiar velocities

Predictive Cross Validation:

Cross-Correlation with other probes without Cosmic Variance

[Image Credit: Yuuki Omori]

 

Constraining Inflation:

Inferring primordial non-gaussianity

Why field-level inference?

Data-driven Subgrid models / Data-driven Systematics

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

  The forward model

Scaling up to survey volumes

Modelling small scale clustering

Survey realism

Model mispecification

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

The FLI Challenges

 Sampling high-dimensional posteriors

The Forward Model

Galaxy Formation

Adapted from arXiv:1804.03097

Symmetries

Connected to Underlying Physics

Hydro sims

Empirical

Halo Occupation Distribution (HOD)

EFT bias expansion

Matter Density

Galaxy Distribution

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

(Slide credit: Matthew Ho)

Scaling Up to Survey Volumes

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

Learning The Universe

Simon Ding

Xiaosheng Zhao

Lucas Makinen

Axel Lapel

Adrian Bayer

Guilhem Lavaux

Benjamin Wandelt

Ce Sui

Matthew Ho

Leander Thiele

Rosa Malandrino

Greg Bryan

Nicolas Chartier

Lucia Perez

Chirag Modi

Deaglan Bartlett

Shivam Pandey

Sammy Sharief

Ana Maria Delgado

 Anirban Bairagi

Christopher Lovell

Carolina Cuesta-Lazaro

Shy Genel

Francisco Villaescusa-Navarro

Laurence Perreault Levasseur

...

Particle Mesh for Gravity

p(\mathrm{Baryons}|\mathrm{DM}, \mathcal{C}, \mathcal{A})

Gas Properties

Density

Temperature

Galaxy Distribution

[See Amanda's talk]

["BaryonBridge: Interpolants models for fast hydrodynamical simulations" Horowitz, Cuesta-Lazaro, Yehia ML4Astro workshop 2025]

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

Scalable Field Level Emulators

Probabilistic

Local

["CHARM: Creating Halos with Auto-Regressive Multi-stage networks" Pandey et al 2024]

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

(Slide credit: Matthew Ho)

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

\Omega_m \, \, \mathrm{Uncertainty}
\sigma_8 \, \, \mathrm{Uncertainty}
(Slide credit: Matthew Ho)

Posterior resimulations in minutes!

  •  Gravity Solver (Gadget-4)
  • Halo finder (SUBFIND)
  • Semi-analytic galaxy formation model (L-Galaxies)

OOD Tests

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

Sampling High Dimensional Posteriors

GANS

Deep Belief Networks

2006

VAEs

Normalising Flows

BigGAN

Diffusion Models

2014

2017

2019

2022

A folk music band of anthropomorphic autumn leaves playing bluegrass instruments

Contrastive Learning

2023

Meanwhile, on Earth...

2026

"Write a C compiler"

AGI?

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

p(\mathrm{World}|\mathrm{Prompt})
["Genie 2: A large-scale foundation model" Parker-Holder et al (2024)]

Probabilistic ML has made high dimensional inference tractable

1024x1024xTime

["Genie 3: A new frontier for world models" Parker-Holder et al (2025)]

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

Goal: Estimate unknown p(x1) from samples

x_0 \sim \rho_0

Base

Target

T: \Omega \to \Omega

Transport Map

x_1 \sim \rho_1 \quad \text{via} \quad T(x_0) = x_1
x_1 \sim \rho_1

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

x_1
x_0

Base

Data

"Creating noise from data is easy;  creating data from noise is generative modeling."

 (Yang Song)

Neural Network

\frac{dx_t}{dt} = b_t(x_t)
\frac{d \rho(x_t)}{dt} = - \nabla \left( b_t(x_t) \rho(x_t) \right)

Transport Map

Continuity Equation

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

I_t(x_0, x_1) = \alpha_t x_0 + \beta_t x_1, \quad (x_0, x_1) \sim \rho(x_0, x_1)
\alpha_t = 1 - t, \quad \beta_t = t
b_t(x) = \mathbb{E}_{\rho(x_0, x_1)}\!\left[\dot{I}_t \,\middle|\, I_t = x\right]
\mathcal{L}_b[\hat{b}] = \int_0^1 \mathbb{E}_{\rho(x_0, x_1)}\left[\left\|\hat{b}_t(I_t) - \dot{I}_t\right\|^2\right] dt

Interpolant

Base

Data

Neural Network

\rho_t = \mathrm{Law}(I_t)
\frac{dx_t}{dt} = \hat{b}_t(x_t)

1) Training

2) Inference

Estimated from samples

(Implicit Likelihood)

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

["Stochastic Interpolants: A Unifying framework for flows and diffusion" 
Albergo et al arXiv:2303.08797]
+ \gamma_t W_t
dX_t = b(t, X_t, x_0) dt + \sigma_t dW_t

Particle Mesh

Dark Matter Only

Gaussian Likelihood

Explicit Sampling vs SBI

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

p(\delta_\mathrm{ICs}, \theta|\delta_\mathrm{Obs}) = p(\delta_\mathrm{ICs} |\delta_\mathrm{Obs}, \theta) p(\theta |\delta_\mathrm{Obs})
["Joint cosmological parameter inference and initial condition reconstruction with Stochastic InterpolantsCuesta-Lazaro, Bayer, Albergo et al 
NeurIPs 2024 ML for the Physical Sciences]

 

Adrian Bayer

Mount Fuji?

Chirag Modi

1) Likelihood not necessarily Gaussian

2) Forward model no need differentiable

3) Amortized

Generative Model: Marginalizing over ICs

Generative Model: Fixing ICs

HMC: Marginalizing over ICs

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

True

Reconstructed

\delta_\mathrm{Obs}
\delta_\mathrm{ICs}
p(\delta_\mathrm{ICs}, \theta|\delta_\mathrm{Obs})

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

SBI

HMC

Cross Correlation Coefficient

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

Scaling up in Volume

p(\delta_\mathrm{ICs}, \theta|\delta_\mathrm{Obs})
p(\delta_\mathrm{ICs}, \theta|\delta_\mathrm{Obs}, \delta_\mathrm{L})

Large Scale Reconstruction

1024^3
1 \, (\mathrm{Gpc}/h)^3

True

\delta_\mathrm{Galaxies}
\delta_\mathrm{ICs}

Reconstructed

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

["Reconstructing Cosmological Initial Conditions from Late-Time Structure with Convolutional Neural Networks
Shallue, Eisenstein 2022]
["Initial conditions from galaxies: machine-learning subgrid correction to standard reconstruction
Parker, Bayer, Seljak 2025]

Power Spectrum

Cross Correlation

Peculiar Velocities

True

Reconstructed

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

Scaling up in volume

Implicit FLI for DESI

DESI Y1 LRG Effective volumes already larger than our sims!

Small Scale Galaxy Bias

Selection

Fibre collisions

Forward Modelling the Survey Systematics

EFT

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

Galaxy Formation

Adapted from arXiv:1804.03097

Symmetries

Connected to Underlying Physics

Hydro sims

Empirical

Halo Occupation Distribution (HOD)

EFT bias expansion

Matter Density

Galaxy Distribution

Self-Consistent Predictions across observables

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

[Video credit: Francisco Villaescusa-Navarro]

Gas density

Gas temperature

Subgrid model 1

Subgrid model 2

Subgrid model 3

Subgrid model 4

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

Can we learn a general and continuous representation of Baryonic feedback?

 

Gas

Galaxies

p(
, z_\mathrm{baryons})

Dark Matter

Baryonic fields

Marginalize over a broader set of subgrid physics

Interpolate between simulators

Mingshau Liu

(Ming)

Constrain z via multi-wavelength observations

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

["Continuous Representations of Baryonic Feedback for Robust Inference from Multiple Simulation Suites
Liu, Cuesta-Lazaro
NeurIPs ML4PS 2025]

Trained on:

TNG, SIMBA, Astrid, EAGLE

z = f(x)

Encoder

z_\mathrm{baryons}

1) Encoder

Gas

Galaxies

p(
p(
, z_\mathrm{baryons})

Dark Matter

Baryonic fields

2) Probabilistic Decoder

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

p(
, z_\mathrm{baryons})

Dark Matter

Baryonic fields

\mathcal{O}(10)

(Test suite)

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

Gas Density

Temperature

Astrid

EAGLE

\alpha = 0
\alpha = 0.25
\alpha = 0.5
\alpha = 0.75
\alpha = 1

Interpolating over Simulations

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

Generalizing to unseen simulations: Magneticum

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

BEFORE

Artificial General Intelligence?

AFTER

DiscoverPhysics: Benchmarking LLMs for
Out-of-the-Box Scientific Thinking

Matt Wiemann

Lindsay Smith

Hypothesis

Simulate World

Invisible particles

Extra dimensions

Multi Species ...

Simulate World

Invisible particles

Extra dimensions

Multi Species ...

Propose Experiment

Simulate

Text: Conceptual Understanding

Trajectories (.csv)

Science Agent (LLM)

Science Agent (LLM)

Outputs

Python Code: Trajectory MSE

["DiscoverPhysics: Benchmarking LLMs for Out-of-the-Box Scientific Thinking" Wiemann, Smith et al (including CCL)]

You are an expert physicist tasked with discovering scientific laws in a simulated universe. Your goal is to propose experiments, analyse the data they return, and ultimately deduce the underlying scientific law. Note that the laws of physics in this universe may differ from those in our own. You can perform experiments to gather data but must follow the protocol strictly.

If F = G/r²: G = 80. If F = G/r: G = 8. Linear in 1/r!

 

m=1: a = 0.664 · m=2: a = 0.614 · m=4: a = 0.664 · m=8: a = 0.714 … That's strange."

 

Consistent pattern: small +y 'extra' force (~0.05) regardless of position … maybe the anchor itself moves.

 

a_y_extra ≈ +0.05 everywhere on probes. Spot on! So anchor too just experiences g_y = 0.05 … g = (0, 0.05) is a uniform external acceleration on every particle.

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

You are an expert physicist tasked with discovering scientific laws in a simulated universe. Your goal is to propose experiments, analyse the data they return, and ultimately deduce the underlying scientific law. Note that the laws of physics in this universe may differ from those in our own. You can perform experiments to gather data but must follow the protocol strictly.

Confirmed: a ∝ 1/r.

 

 

observed −5.18. WAY too small. Something is very wrong with my understanding!

 

 

artifact of integration accuracy, NOT the true law. Probably because of close encounters causing chaos.

 

I'll just submit my law. The functional form is solid.

 

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

Conceptual Understanding

Trajectory Prediction

Random Seeds

You are an expert physicist tasked with discovering scientific laws in a simulated universe. Your goal is to propose experiments, analyse the data they return, and ultimately deduce the underlying scientific law. Note that the laws of physics in this universe may differ from those in our own. You can perform experiments to gather data but must follow the protocol strictly.

a × r seems roughly constant … not quite 1/r.

 

Let me try a × r²: … Not consistent.

 

 

So at large r, a ~ 1/r. At small r, a ~ 1/r².

   

Actually note the file path in the error message: 'extra_dimensions.csv'! This is a hint! The world might have an extra compactified dimension…

 

[Model starts fitting Yukawa potential, Bessel functions... At some point fitting tool errors.]

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

Astrophysics proliferates in Simulation-based Inference

on Simulations

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

x^\mathcal{O}
x^\mathcal{S}

Simulated Data

Observed Data

z^\mathcal{O}_p
z^\mathcal{O}_s
z^\mathcal{S}_s
z^\mathcal{S}_p

Alignment Loss

\mathcal{L} = \sum_{\mathcal{D} \in (\mathcal{S}, \mathcal{O})} p(x^\mathcal{D}|z^\mathcal{D}_s, z^\mathcal{D}_p) + \lambda d(z^\mathcal{O}_s,z^\mathcal{S}_s)

Reconstruction

Statistical Alignment

50\%

(OT / Adversarial)

Encoder

Obs

Encoder

Sims

Private Domain Information

Shared Information

\hat{x}^\mathcal{O}
\hat{x}^\mathcal{S}

Observed Reconstructed

Simulated Reconstructed

Shared Decoder

Shared Decoder

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

A Toy Model Example

Idealized Simulations

Observations

+ Scale Dependent Noise

+ Bump

x^\mathcal{O}
x^\mathcal{S}

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026

Amplitude

Tilt

Tilt

p(\theta|z^\mathcal{O}_s)
p(\theta|z^\mathcal{O}_p)
p(\theta|z^\mathcal{O}_p,z^\mathcal{O}_s)
p(\theta|z^\mathcal{O}_p)

Robust SBI from Shared

p(x^\mathcal{O}|z^\mathcal{O}_p,z^\mathcal{O}_s)
p(x^\mathcal{O}|z^\mathcal{O}_s)

Visualizing Information Split

Carolina Cuesta-Lazaro Flatiron/IAS @ Perimeter 2026