A digital twin of our Universe
Observed Galaxy Distribution
Simulated Galaxy Distribution
Field Level Inference
Forward Model
(= no Cosmic Variance)
Optimal constraints
N-point functions
Counts-in-cell
Wavelets
Marked tpcfs
Voids
Do we really need to infer the ICs?
["Simulation-Based Emulators for Galaxy Clustering in the Era of Stage-IV Surveys: I. Two-Point Statistics and Beyond" Paillas et al (include CCL) 2026]
Neural Compression
Initial Conditions
Marginal Likelihood
["Diffusion-HMC: Parameter Inference with Diffusion Model driven Hamiltonian Monte Carlo" Mudur, Cuesta-Lazaro and Finkbeiner NeurIPs 2023 ML for the physical sciences, arXiv:2405.05255]
Reconstructing ALL latent variables:
Dark Matter distribution
Entire formation history
Peculiar velocities
Predictive Cross Validation:
Cross-Correlation with other probes without Cosmic Variance
[Image Credit: Yuuki Omori]
Constraining Inflation:
Inferring primordial non-gaussianity
Data-driven Subgrid models / Data-driven Systematics
The forward model
Scaling up to survey volumes
Modelling small scale clustering
Survey realism
Model mispecification
Sampling high-dimensional posteriors
Adapted from arXiv:1804.03097
Symmetries
Connected to Underlying Physics
Hydro sims
Empirical
Halo Occupation Distribution (HOD)
EFT bias expansion
Matter Density
Galaxy Distribution
(Slide credit: Matthew Ho)Simon Ding
Xiaosheng Zhao
Lucas Makinen
Axel Lapel
Adrian Bayer
Guilhem Lavaux
Benjamin Wandelt
Ce Sui
Matthew Ho
Leander Thiele
Rosa Malandrino
Greg Bryan
Nicolas Chartier
Lucia Perez
Chirag Modi
Deaglan Bartlett
Shivam Pandey
Sammy Sharief
Ana Maria Delgado
Anirban Bairagi
Christopher Lovell
Carolina Cuesta-Lazaro
Shy Genel
Francisco Villaescusa-Navarro
Laurence Perreault Levasseur
...
Particle Mesh for Gravity
Gas Properties
Density
Temperature
Galaxy Distribution
[See Amanda's talk]
["BaryonBridge: Interpolants models for fast hydrodynamical simulations" Horowitz, Cuesta-Lazaro, Yehia ML4Astro workshop 2025]Probabilistic
Local
["CHARM: Creating Halos with Auto-Regressive Multi-stage networks" Pandey et al 2024]
(Slide credit: Matthew Ho)(Slide credit: Matthew Ho)Posterior resimulations in minutes!
GANS
Deep Belief Networks
2006
VAEs
Normalising Flows
BigGAN
Diffusion Models
2014
2017
2019
2022
A folk music band of anthropomorphic autumn leaves playing bluegrass instruments
Contrastive Learning
2023
2026
"Write a C compiler"
AGI?
["Genie 2: A large-scale foundation model" Parker-Holder et al (2024)]
Probabilistic ML has made high dimensional inference tractable
1024x1024xTime
["Genie 3: A new frontier for world models" Parker-Holder et al (2025)]
Goal: Estimate unknown p(x1) from samples
Base
Target
Transport Map
Base
Data
"Creating noise from data is easy; creating data from noise is generative modeling."
(Yang Song)
Neural Network
Transport Map
Continuity Equation
Interpolant
Base
Data
Neural Network
1) Training
2) Inference
Estimated from samples
(Implicit Likelihood)
["Stochastic Interpolants: A Unifying framework for flows and diffusion" Albergo et al arXiv:2303.08797]
Particle Mesh
Dark Matter Only
Gaussian Likelihood
["Joint cosmological parameter inference and initial condition reconstruction with Stochastic Interpolants" Cuesta-Lazaro, Bayer, Albergo et al NeurIPs 2024 ML for the Physical Sciences]
Adrian Bayer
Mount Fuji?
Chirag Modi
1) Likelihood not necessarily Gaussian
2) Forward model no need differentiable
3) Amortized
Generative Model: Marginalizing over ICs
Generative Model: Fixing ICs
HMC: Marginalizing over ICs
True
Reconstructed
SBI
HMC
Cross Correlation Coefficient
Large Scale Reconstruction
True
Reconstructed
["Reconstructing Cosmological Initial Conditions from Late-Time Structure with Convolutional Neural Networks"
Shallue, Eisenstein 2022]["Initial conditions from galaxies: machine-learning subgrid correction to standard reconstruction"
Parker, Bayer, Seljak 2025]Power Spectrum
Cross Correlation
Peculiar Velocities
True
Reconstructed
Scaling up in volume
DESI Y1 LRG Effective volumes already larger than our sims!
Small Scale Galaxy Bias
Selection
Fibre collisions
Forward Modelling the Survey Systematics
EFT
Adapted from arXiv:1804.03097
Symmetries
Connected to Underlying Physics
Hydro sims
Empirical
Halo Occupation Distribution (HOD)
EFT bias expansion
Matter Density
Galaxy Distribution
Self-Consistent Predictions across observables
[Video credit: Francisco Villaescusa-Navarro]
Gas density
Gas temperature
Subgrid model 1
Subgrid model 2
Subgrid model 3
Subgrid model 4
Gas
Galaxies
Dark Matter
Baryonic fields
Marginalize over a broader set of subgrid physics
Interpolate between simulators
Mingshau Liu
(Ming)
Constrain z via multi-wavelength observations
["Continuous Representations of Baryonic Feedback for Robust Inference from Multiple Simulation Suites"
Liu, Cuesta-Lazaro
NeurIPs ML4PS 2025]Trained on:
TNG, SIMBA, Astrid, EAGLE
Encoder
1) Encoder
Gas
Galaxies
Dark Matter
Baryonic fields
2) Probabilistic Decoder
Dark Matter
Baryonic fields
(Test suite)
Gas Density
Temperature
Astrid
EAGLE
Matt Wiemann
Lindsay Smith
Hypothesis
Simulate World
Invisible particles
Extra dimensions
Multi Species ...
Simulate World
Invisible particles
Extra dimensions
Multi Species ...
Propose Experiment
Simulate
Text: Conceptual Understanding
Trajectories (.csv)
Science Agent (LLM)
Science Agent (LLM)
Outputs
Python Code: Trajectory MSE
["DiscoverPhysics: Benchmarking LLMs for Out-of-the-Box Scientific Thinking" Wiemann, Smith et al (including CCL)]
You are an expert physicist tasked with discovering scientific laws in a simulated universe. Your goal is to propose experiments, analyse the data they return, and ultimately deduce the underlying scientific law. Note that the laws of physics in this universe may differ from those in our own. You can perform experiments to gather data but must follow the protocol strictly.
If F = G/r²: G = 80. If F = G/r: G = 8. Linear in 1/r!
m=1: a = 0.664 · m=2: a = 0.614 · m=4: a = 0.664 · m=8: a = 0.714 … That's strange."
Consistent pattern: small +y 'extra' force (~0.05) regardless of position … maybe the anchor itself moves.
a_y_extra ≈ +0.05 everywhere on probes. Spot on! So anchor too just experiences g_y = 0.05 … g = (0, 0.05) is a uniform external acceleration on every particle.
You are an expert physicist tasked with discovering scientific laws in a simulated universe. Your goal is to propose experiments, analyse the data they return, and ultimately deduce the underlying scientific law. Note that the laws of physics in this universe may differ from those in our own. You can perform experiments to gather data but must follow the protocol strictly.
Confirmed: a ∝ 1/r.
observed −5.18. WAY too small. Something is very wrong with my understanding!
artifact of integration accuracy, NOT the true law. Probably because of close encounters causing chaos.
I'll just submit my law. The functional form is solid.
Conceptual Understanding
Trajectory Prediction
Random Seeds
You are an expert physicist tasked with discovering scientific laws in a simulated universe. Your goal is to propose experiments, analyse the data they return, and ultimately deduce the underlying scientific law. Note that the laws of physics in this universe may differ from those in our own. You can perform experiments to gather data but must follow the protocol strictly.
a × r seems roughly constant … not quite 1/r.
Let me try a × r²: … Not consistent.
So at large r, a ~ 1/r. At small r, a ~ 1/r².
Actually note the file path in the error message: 'extra_dimensions.csv'! This is a hint! The world might have an extra compactified dimension…
[Model starts fitting Yukawa potential, Bessel functions... At some point fitting tool errors.]
Simulated Data
Observed Data
Alignment Loss
Reconstruction
Statistical Alignment
(OT / Adversarial)
Encoder
Obs
Encoder
Sims
Private Domain Information
Shared Information
Observed Reconstructed
Simulated Reconstructed
Shared Decoder
Shared Decoder
Idealized Simulations
Observations
+ Scale Dependent Noise
+ Bump
Amplitude
Tilt
Tilt