Generative Solutions for Cosmic Problems

Flatiron Institute

 

Carol(ina) Cuesta-Lazaro

1-Dimensional

Machine Learning

Secondary anisotropies

Galaxy formation

Intrinsic alignments

DESI / SphereX 

Euclid / LSST

SO / CMB-S4

Ligo / Einstein

The era of Big Data Cosmology

xAstrophysics

HERA / CHIME

SAGA / MANGA

Galaxy formation

Hosts

Reionization

Cosmic Microwave Background

Galaxies / Dwarfs

21 cm

Galaxy Surveys

Gravitational Lensing

Gravitational Waves

AGN Feedback/Supernovae

Carolina Cuesta-Lazaro - IAS

"Better inference methods = Better Data"

"Cosmology needs Astrophysics

Astrophysics needs Cosmology"

GANS

Deep Belief Networks

2006

VAEs

Normalising Flows

BigGAN

Diffusion Models

2014

2017

2019

2022

A folk music band of anthropomorphic autumn leaves playing bluegrass instruments

Contrastive Learning

2023

Meanwhile, on Earth...

Carolina Cuesta-Lazaro - IAS

2026

"Write a C compiler"

AGI?

p(\mathrm{World}|\mathrm{Prompt})
["Genie 2: A large-scale foundation model" Parker-Holder et al (2024)]

Probabilistic ML has made high dimensional inference tractable

1024x1024xTime

["Genie 3: A new frontier for world models" Parker-Holder et al (2025)]

Carolina Cuesta-Lazaro - IAS

Carolina Cuesta-Lazaro - IAS

Goal: Estimate unknown p(x1) from samples

x_0 \sim \rho_0

Base

Target

T: \Omega \to \Omega

Transport Map

x_1 \sim \rho_1 \quad \text{via} \quad T(x_0) = x_1
x_1 \sim \rho_1
x_1
x_0

Base

Data

"Creating noise from data is easy;  creating data from noise is generative modeling."

 (Yang Song)

Carolina Cuesta-Lazaro - IAS

Neural Network

\frac{dx_t}{dt} = b_t(x_t)
\frac{d \rho(x_t)}{dt} = - \nabla \left( b_t(x_t) \rho(x_t) \right)

Transport Map

Continuity Equation

Carolina Cuesta-Lazaro - IAS

I_t(x_0, x_1) = \alpha_t x_0 + \beta_t x_1, \quad (x_0, x_1) \sim \rho(x_0, x_1)
\alpha_t = 1 - t, \quad \beta_t = t
b_t(x) = \mathbb{E}_{\rho(x_0, x_1)}\!\left[\dot{I}_t \,\middle|\, I_t = x\right]
\mathcal{L}_b[\hat{b}] = \int_0^1 \mathbb{E}_{\rho(x_0, x_1)}\left[\left\|\hat{b}_t(I_t) - \dot{I}_t\right\|^2\right] dt

Interpolant

Base

Data

Neural Network

\rho_t = \mathrm{Law}(I_t)
\frac{dx_t}{dt} = \hat{b}_t(x_t)

1) Training

2) Inference

Estimated from samples

(Implicit Likelihood)

Carolina Cuesta-Lazaro - IAS

What is field-level inference?

A digital twin of our Universe

Observed Galaxy Distribution

Simulated Galaxy Distribution

Field Level Inference

Forward Model

(= no Cosmic Variance)

+
\Omega_m,
\sigma_8 ...
p(\delta_{\mathrm{ICs}}, \mathcal{\theta}|\delta_{\mathrm{Obs}})

Carolina Cuesta-Lazaro - IAS

Why field-level inference?

Optimal constraints

p(
)
|
\mathrm{Cosmology}

Counts-in-cell

Do we really need to infer 10^9 parameters to constrain ~10?

p(
)
|
\mathrm{Cosmology}

Compression

Marginal Likelihood

p(x|\theta) = \int p(x|z, \theta) p(z|\theta) \, dz

Initial Conditions

Carolina Cuesta-Lazaro - IAS

p(\theta|F(x))

Carolina Cuesta-Lazaro - IAS

["Simulation-Based Emulators for Galaxy Clustering in the Era of Stage-IV Surveys: I. Two-Point Statistics and Beyond" Paillas et al (include CCL) 2026]
+

Reconstructing ALL latent variables:

Dark Matter distribution

Entire formation history

Peculiar velocities

Predictive Cross Validation:

Cross-Correlation with other probes without Cosmic Variance

[Image Credit: Yuuki Omori]

 

Constraining Inflation:

Inferring primordial non-gaussianity

Why field-level inference?

Data-driven Subgrid models / Data-driven Systematics

Carolina Cuesta-Lazaro - IAS

"Joint cosmological parameter inference and initial condition reconstruction with Stochastic Interpolants
Cuesta-Lazaro, Bayer, Albergo et al
NeurIPs ML4PS 2024 

Particle Mesh

Dark Matter Only

Gaussian Likelihood

Explicit Sampling vs SBI

Carolina Cuesta-Lazaro - IAS

1) Likelihood not necessarily Gaussian

2) Forward model no need differentiable

3) Amortized

Generative Model: Marginalizing over ICs

Generative Model: Fixing ICs

HMC: Marginalizing over ICs

Carolina Cuesta-Lazaro - IAS

True

Reconstructed

\delta_\mathrm{Obs}
\delta_\mathrm{ICs}
p(\delta_\mathrm{ICs}, \theta|\delta_\mathrm{Obs})

Carolina Cuesta-Lazaro - IAS

SBI

HMC

Carolina Cuesta-Lazaro - IAS

Cross Correlation Coefficient

Carolina Cuesta-Lazaro - IAS

Scaling up in volume

Implicit FLI for DESI

DESI Y1 LRG Effective volumes already larger than our sims!

Small Scale Galaxy Bias

Selection

Fibre collisions

Forward Modelling the Survey Systematics

EFT

Carolina Cuesta-Lazaro - IAS

Galaxy Formation

Adapted from arXiv:1804.03097

Carolina Cuesta-Lazaro - IAS

Symmetries

Connected to Underlying Physics

Hydro sims

Empirical

Halo Occupation Distribution (HOD)

EFT bias expansion

Matter Density

Galaxy Distribution

Scaling up in Volume

Carolina Cuesta-Lazaro - IAS

p(\delta_\mathrm{ICs}, \theta|\delta_\mathrm{Obs})
p(\delta_\mathrm{ICs}, \theta|\delta_\mathrm{Obs}, \delta_\mathrm{BAO})

Large Scale

1024^3
1 \, (\mathrm{Gpc}/h)^3

True

\delta_\mathrm{Galaxies}
\delta_\mathrm{ICs}

Reconstructed

Carolina Cuesta-Lazaro - IAS

Power Spectrum

Cross Correlation

Peculiar Velocities

True

Reconstructed

Matter Density

Galaxy Distribution

Effective Field Theory

Dimensions + Symmetries

\delta_{\rm g} = b_1 \delta + b_2 \delta^2 + b_{\mathcal{G}_2} \left( \nabla_{\langle i} \nabla_{j \rangle} \Phi \right)^2 + \ldots
+ v^z
+ \phi

Rotational invariance

(+ Galilean inv)

Equivalence Principle

Carolina Cuesta-Lazaro - IAS

"Large Scale Galaxy Bias
Desjacques, Jeong, Schmidt

Simulation Based Priors

p( b_1, b_2, b_{\mathcal{G}_2}... )

Carolina Cuesta-Lazaro - IAS

Simulated Galaxies

EFT Field Level Fit

\{ b_1, b_2, b_{\mathcal{G}_2}... \}

Fit:

?

["Full-shape analysis with simulation-based priors: Constraints on single field inflation from BOSS" Ivanov, Cuesta-Lazaro et al 2025]

Carolina Cuesta-Lazaro - IAS

40% Improvement!

x2 survey volume

BOSS + Conservative Priors

BOSS + Simulation Based Priors

Simulation Based Priors

Galaxy Formation

Adapted from arXiv:1804.03097

Carolina Cuesta-Lazaro - IAS

Symmetries

Connected to Underlying Physics

Hydro sims

Empirical

Halo Occupation Distribution (HOD)

EFT bias expansion

Matter Density

Galaxy Distribution

Self-Consistent Predictions across observables

X-Ray

 

 

Cluster gas mass fractions

Cluster gas density profiles

Sunyaev-Zeldovich

Galaxy Properties

Thermal Integrated electron pressure (hot electrons / big objects)

Star formation + histories

Stellar mass / halo mass relation

FRBs

Integrated electron density

Kinetic Integrated electron density x peculiar velocity

Multi-wavelength Observables

Carolina Cuesta-Lazaro - IAS

["BaryonBridge: Interpolants models for fast hydrodynamical simulations" Horowitz, Cuesta-Lazaro, Yehia ML4Astro workshop 2025]

Particle Mesh for Gravity

CAMELS Volumes

25 h^{-1} \mathrm{Mpc}

1000 boxes with varying cosmology and feedback models

Gas Properties

Current model optimised for Lyman Alpha forest

7 GPU minutes for a 50 Mpc simulation

130 million CPU core hours for TNG50

Density

Temperature

Galaxy Distribution

+ \mathcal{C}, \mathcal{A}
p(\mathrm{Baryons}|\mathrm{DM}, \mathcal{C}, \mathcal{A})

Filed Level Emulators: Hydro At Scale

Carolina Cuesta-Lazaro - IAS

Carolina Cuesta-Lazaro - IAS

["BaryonBridge: Interpolants models for fast hydrodynamical simulations" Horowitz, Cuesta-Lazaro, Yehia ML4Astro workshop 2025]

Variations in Subgrid Physics

Volume Upscaling

[Video credit: Francisco Villaescusa-Navarro]

Gas density

Gas temperature

Subgrid model 1

Subgrid model 2

Subgrid model 3

Subgrid model 4

Carolina Cuesta-Lazaro - IAS

Can we learn a general and continuous representation of Baryonic feedback?

 

Gas

Galaxies

p(
, z_\mathrm{baryons})

Dark Matter

Baryonic fields

Marginalize over a broader set of subgrid physics

Interpolate between simulators

Mingshau Liu

(Ming)

Constrain z via multi-wavelength observations

Carolina Cuesta-Lazaro - IAS

Trained on:

TNG, SIMBA, Astrid, EAGLE

z = f(x)

Encoder

z_\mathrm{baryons}

1) Encoder

Gas

Galaxies

p(
p(
, z_\mathrm{baryons})

Dark Matter

Baryonic fields

2) Probabilistic Decoder

Carolina Cuesta-Lazaro - IAS

p(
, z_\mathrm{baryons})

Dark Matter

Baryonic fields

\mathcal{O}(10)

(Test suite)

Carolina Cuesta-Lazaro - IAS

Gas Density

Temperature

Astrid

EAGLE

\alpha = 0
\alpha = 0.25
\alpha = 0.5
\alpha = 0.75
\alpha = 1

Interpolating over Simulations

Carolina Cuesta-Lazaro - IAS

Generalizing to unseen simulations: Magneticum

Carolina Cuesta-Lazaro - IAS

BEFORE

Artificial General Intelligence?

AFTER

https://parti.research.google​​​​​​​

A portrait photo of  a kangaroo wearing an orange hoodie and blue sunglasses standing on the grass in front of the Sydney Opera House holding a sign on the chest that says Welcome Friends!

Carolina Cuesta-Lazaro - IAS

Reinforcement Learning

Carolina Cuesta-Lazaro - IAS

Bag of verifiable tasks

\Pi_\theta

Policy (LLM)

Verifiable Reward

\theta_{t+1} = \theta_t + \alpha \nabla_\theta J(\phi_\theta)

Expected Returns

["DeepSeek-R1" Guo et al 2025 arXiv:2501.12948]

Coding competitions

R

What should we be thinking about?

Should Academia give up on training LLMs?

Should we design our own RL environments?

Should we think about the most ambitious projects we could tackle with a "country of geniuses in a data center"?

A radical change to how we work or just highlighting what was obviously wrong?

Carolina Cuesta-Lazaro - IAS

\Lambda \mathrm{CDM}
["DESI 2024 VI: Cosmological Constraints from the Measurements of Baryon Acoustic Oscillations" arXiv:2404.03002]

Dark Energy is constant over time

w(z) = \frac{p_\mathrm{DE}}{\rho_\mathrm{DE}} = w_0 + \frac{z}{1+z}w_a
2 - 4 \sigma

Carolina Cuesta-Lazaro - IAS

["An LLM-driven framework for cosmological
model-building and exploration" Mudur, Cuesta-Lazaro, Toomey ]

Can LLMs help us explore the space of hypothesis?

Propose a model for Dark Energy

Implement it in a Cosmology simulation code: CLASS

Test fit to DESI Observations

Iterate to improve fit

Quintessence, DE/DM interactions....

Must pass a set of general tests for "reasonable" models

Ideally, compare evidence to LCDM.

For now, Bayesian Information Criteria (BIC)

1

2

Nayantara Mudur (Harvard)

Carolina Cuesta-Lazaro - IAS

Can LLMs implement new physics models?

Thawing Quintessence

Axion-like Early Dark Energy

Ultra-light scalar field that temporarily acts as dark energy in the early universe 

Implementation Challenge:

Dynamic dark energy model: scalar field transitions from "frozen"  (cosmological constant-like) to evolving as the universe expands.

Oscillatory behaviour

Can take advantage of existing scalar field implementations in CLASS

+ 43,000 lines of C code

+ 10,000 lines of numerical files

CLASS Challenge:

Carolina Cuesta-Lazaro - IAS

1) Code compiles + passes unit tests (reasonable observables, numerical convergence...)

2) Implementation agrees with target repository

3) Goodness of fit for DESI + Supernovae

4) H0 tension metrics

Curated

1 page long description of model to be implemented,  CLASS tips + very explicit units

Paper

Directly from a full paper

If fails, get feedback from another LLM

Carolina Cuesta-Lazaro - IAS

IAS-2026

By carol cuesta

IAS-2026

  • 8