A biased Perspective
[Video Credit: N-body simulation Francisco Villaescusa-Navarro]
Carolina Cuesta-Lazaro
A digital twin of our Universe
Observed Galaxy Distribution
Simulated Galaxy Distribution
Field Level Inference
Forward Model
(= no Cosmic Variance)
Optimal constraints
Counts-in-cell
Do we really need to infer 10^9 parameters to constrain 10?
["Genie 2: A large-scale foundation model" Parker-Holder et al (2024)]
Probabilistic ML has made high dimensional inference tractable
1024x1024xTime
["Genie 3: A new frontier for world models" Parker-Holder et al (2025)]
Compression
Marginal Likelihood
Explicit Likelihood
Implicit Likelihood
Base
Data
"Creating noise from data is easy;
creating data from noise is generative modeling."
Yang Song
Neural Network
["A point cloud approach to generative modeling for galaxy surveys at the field level"
Cuesta-Lazaro and Mishra-Sharma
International Conference on Machine Learning ICML AI4Astro 2023, Spotlight talk, arXiv:2311.17141]
Target Distribution
Simulated Galaxy 3d Map
Base Distribution
Prompt:
High-Dimensional
Low-Dimensional
s is sufficient iif
Maximise
Mutual Information
["Optimal Neural Summarisation for Full-Field Weak Lensing Cosmological Implicit Inference" Lanzieri et al]
CNN
Diffusion
Increasing Noise
["Diffusion-HMC: Parameter Inference with Diffusion Model driven Hamiltonian Monte Carlo" Mudur, Cuesta-Lazaro and Finkbeiner NeurIPs 2023 ML for the physical sciences, arXiv:2405.05255]
Nayantara Mudur
NPE-Compression
Diffusion
Learning the marginal likelihood is more robust
Diffusion model
Robustness?
Optimal Summaries
FLI
Same pixel-level fidelity required
Number of simulations needed?
Training simulations are IID
Very high dimensional inference!
Low dimensional inference
Marginal Likelihood
Amortized
Reconstructing ALL latent variables:
Dark Matter distribution
Entire formation history
Peculiar velocities
Predictive Cross Validation:
Cross-Correlation with other probes without Cosmic Variance
[Image Credit: Yuuki Omori]
Constraining Inflation:
Inferring primordial non-gaussianity
Data-driven Subgrid models
True
Reconstructed
"Joint cosmological parameter inference and initial condition reconstruction with Stochastic Interpolants"
Cuesta-Lazaro, Bayer, Albergo et al
NeurIPs ML4PS 2024 Spotlight talk
"Detecting model mispecification in cosmology with scale-dependent normalizing flows"
Akhmetzhanova, Cuesta-Lazaro, Mishra-Sarhma 2025
arXiv:2508.05744
Aizhan Akhmetzhanova
Use optimal summaries instead of field
Base
OOD Mock 1
OOD Mock 2
Base
OOD Mock 1
OOD Mock 2
Large Scales
Small Scales
Small Scales
OOD Mock 1
OOD Mock 2
Parameter Inference Bias (Supervised)
OOD Metric (Unsupervised)
Large Scales
Small Scales
Self consistent predictions
Directly? linked to physical processes
Large Volumes
Large Volumes
MTNG ~ 500 Mpc/h
Robust
Clear assumptions
Large Scales
Galaxy formation?
["Full-shape analysis with simulation-based priors: Constraints on single field inflation from BOSS" Ivanov, Cuesta-Lazaro, Mishra-Sharma, Oblujen, Toomey arXiv:2402.13310]
Effective Field Theories
Empirical
HOD/SHAM
Fast
Accurate?
Hydrodynamics
Fast
Clear assumptions
Galaxy formation?
["BaryonBridge: Interpolants models for fast hydrodynamical simulations" Horowitz, Cuesta-Lazaro, Yehia ML4Astro workshop 2025]
Particle Mesh for Gravity
CAMELS Volumes
1000 boxes with varying cosmology and feedback models
Gas Properties
Current model optimised for Lyman Alpha forest
7 GPU minutes for a 50 Mpc simulation
130 million CPU core hours for TNG50
Density
Temperature
Galaxy Distribution
Learn a continuous representation for feedback
Dark Matter
Baryonic fields
Mingshau Liu
(Ming)
2) Assess the robustness of field-level inference via parameter-masked mock challenges in realistic OOD scenarios (example Beyond2pt)
3) Development of open source ecosystems for more plug and play models
Field level analysis too complex for one group to develop a robust framework!
1) Need to develop better validation metrics (requires better validation suites)
Looking for PhD stduents and Postdocs on AIxAstro
carolina.clzr@gmail.com