Field Level Inference

A biased Perspective

[Video Credit: N-body simulation Francisco Villaescusa-Navarro]

Carolina Cuesta-Lazaro

 

Flatiron Institute

Institute for Advanced Studies

What is field-level inference?

A digital twin of our Universe

Observed Galaxy Distribution

Simulated Galaxy Distribution

Field Level Inference

Forward Model

(= no Cosmic Variance)

+
\Omega_m,
\sigma_8 ...
p(\delta_{\mathrm{ICs}}, \mathcal{\theta}|\delta_{\mathrm{Obs}})

Carolina Cuesta-Lazaro Flatiron/IAS - FLI

Why field-level inference?

Optimal constraints

p(
)
|
\mathrm{Cosmology}

Counts-in-cell

Do we really need to infer 10^9 parameters to constrain 10?

p(
)
|
\mathrm{Cosmology}

Carolina Cuesta-Lazaro Flatiron/IAS - FLI

p(\mathrm{World}|\mathrm{Prompt})
["Genie 2: A large-scale foundation model" Parker-Holder et al (2024)]

Probabilistic ML has made high dimensional inference tractable

1024x1024xTime

["Genie 3: A new frontier for world models" Parker-Holder et al (2025)]

Carolina Cuesta-Lazaro Flatiron/IAS - FLI

What field level inference isn't: Marginalisation

p(
)
|
\mathrm{Cosmology}
S(
)
p(
)
|
\mathrm{Cosmology}

Carolina Cuesta-Lazaro Flatiron/IAS - FLI

Compression

Marginal Likelihood

p(x|\theta) = \int p(x|z, \theta) p(z|\theta) \, dz

Explicit Likelihood

Implicit Likelihood

Bridging two distributions

x_1
x_0

Base

Data

"Creating noise from data is easy;

creating data from noise is generative modeling."

 Yang Song

Neural Network

\frac{dx_t}{dt} = v^\phi_t(x_t)
\frac{d p(x_t)}{dt} = - \nabla \left( v^\phi_t(x_t) p(x_t) \right)

Learning likelihoods at the field-level

["A point cloud approach to generative modeling for galaxy surveys at the field level"

Cuesta-Lazaro and Mishra-Sharma
International Conference on Machine Learning ICML AI4Astro 2023, Spotlight talk, arXiv:2311.17141]

Target Distribution

Simulated Galaxy 3d Map

Base Distribution

Prompt:

\Omega_m, \sigma_8

Carolina Cuesta-Lazaro Flatiron/IAS - FLI

p(
)
|
\mathrm{Cosmology}
x
s = F_\eta(x)

High-Dimensional

Low-Dimensional

p(\theta|x) = p(\theta|s)

s is sufficient iif

Neural Compression

Carolina Cuesta-Lazaro Flatiron/IAS - FLI

p(
)
|
\mathrm{Cosmology}
S(
)
I(s(x), \theta)

Maximise

Mutual Information

Neural Posterior Estimation -> Optimal Summaries

["Optimal Neural Summarisation for Full-Field Weak Lensing Cosmological Implicit Inference" Lanzieri et al]

Carolina Cuesta-Lazaro Flatiron/IAS - FLI

CNN

Diffusion

Increasing Noise

p(\sigma_8|\delta_m)
p(\sigma_8|\delta_m + 0.01 \epsilon)
p(\sigma_8|\delta_m + 0.02 \epsilon)
["Diffusion-HMC: Parameter Inference with Diffusion Model driven Hamiltonian Monte Carlo" 
Mudur, Cuesta-Lazaro and Finkbeiner
NeurIPs 2023 ML for the physical sciences, arXiv:2405.05255]

 

Nayantara Mudur

NPE-Compression

Diffusion

Learning the marginal likelihood is more robust

Carolina Cuesta-Lazaro Flatiron/IAS - FLI

p(\theta|x) = \frac{p(x|\theta)p(\theta)}{p(x)}

Diffusion model

Robustness?

Is Field-Level Inference worth it?

p(\delta_{\mathrm{ICs}}, \mathcal{\theta}|\delta_{\mathrm{Obs}})
p(\mathcal{\theta}|S(\delta_{\mathrm{Obs}}))

Optimal Summaries

FLI

\mathcal{O}(10)
\mathcal{O}(10-100)
\mathcal{O}(10^9)
\mathcal{O}(10^9)

Same pixel-level fidelity required

Number of simulations needed?

Training simulations are IID

Very high dimensional inference!

Low dimensional inference

p(\delta_{\mathrm{Obs}}|\mathcal{\theta})

Marginal Likelihood

\mathcal{O}(10^9)
\mathcal{O}(10)

Amortized

Carolina Cuesta-Lazaro Flatiron/IAS - FLI

+

Reconstructing ALL latent variables:

Dark Matter distribution

Entire formation history

Peculiar velocities

Predictive Cross Validation:

Cross-Correlation with other probes without Cosmic Variance

[Image Credit: Yuuki Omori]

 

Constraining Inflation:

Inferring primordial non-gaussianity

Why field-level inference?

Carolina Cuesta-Lazaro Flatiron/IAS - FLI

Data-driven Subgrid models

True

Reconstructed

\delta_\mathrm{Obs}
\delta_\mathrm{ICs}
"Joint cosmological parameter inference and initial condition reconstruction with Stochastic Interpolants
Cuesta-Lazaro, Bayer, Albergo et al 
NeurIPs ML4PS 2024 Spotlight talk

 

p(\delta_\mathrm{ICs}, \theta|\delta_\mathrm{Obs})

Carolina Cuesta-Lazaro Flatiron/IAS - FLI

"Detecting model mispecification in cosmology with scale-dependent normalizing flows
Akhmetzhanova, Cuesta-Lazaro, Mishra-Sarhma 2025

arXiv:2508.05744

Aizhan Akhmetzhanova 

Carolina Cuesta-Lazaro Flatiron/IAS - FLI

Use optimal summaries instead of field

How well does the model fit the data?

Carolina Cuesta-Lazaro Flatiron/IAS - FLI

Base

OOD Mock 1

OOD Mock 2

Base

OOD Mock 1

OOD Mock 2

Large Scales

Small Scales

Small Scales

OOD Mock 1

OOD Mock 2

Parameter Inference Bias (Supervised)

OOD Metric (Unsupervised)

Large Scales

Small Scales

Carolina Cuesta-Lazaro Flatiron/IAS - FLI

Galaxy Bias

Self consistent predictions 

Directly? linked to physical processes

Large Volumes

Large Volumes

MTNG ~ 500 Mpc/h

Robust

Clear assumptions

Large Scales

Galaxy formation?

["Full-shape analysis with simulation-based priors: Constraints on single field inflation from BOSS" Ivanov, Cuesta-Lazaro, Mishra-Sharma, Oblujen, Toomey arXiv:2402.13310]

 

Effective Field Theories

Empirical

HOD/SHAM

Fast

Accurate?

Hydrodynamics

Fast

Clear assumptions

Galaxy formation?

Carolina Cuesta-Lazaro Flatiron/IAS - FLI

["BaryonBridge: Interpolants models for fast hydrodynamical simulations" Horowitz, Cuesta-Lazaro, Yehia ML4Astro workshop 2025]

Particle Mesh for Gravity

CAMELS Volumes

25 h^{-1} \mathrm{Mpc}

1000 boxes with varying cosmology and feedback models

Gas Properties

Current model optimised for Lyman Alpha forest

7 GPU minutes for a 50 Mpc simulation

130 million CPU core hours for TNG50

Density

Temperature

Galaxy Distribution

+ \mathcal{C}, \mathcal{A}
p(\mathrm{Baryons}|\mathrm{DM}, \mathcal{C}, \mathcal{A})

Hydro Simulations at scale

Carolina Cuesta-Lazaro Flatiron/IAS - FLI

Learn a continuous representation for feedback

p(
, \mathrm{Cosmology}, z_\mathrm{baryons})

Dark Matter

Baryonic fields

Carolina Cuesta-Lazaro Flatiron/IAS - FLI

Mingshau Liu

(Ming)

The Roadmap

2) Assess the robustness of field-level inference via parameter-masked mock challenges in realistic OOD scenarios (example Beyond2pt)

3) Development of open source ecosystems for more plug and play models

 Field level analysis too complex for one group to develop a robust framework!

1) Need to develop better validation metrics (requires better validation suites)

Carolina Cuesta-Lazaro Flatiron/IAS - FLI

Looking for PhD stduents and Postdocs on AIxAstro

carolina.clzr@gmail.com