"Tiny human catches a few photons in a bucket, declares dark energy is dynamic"
Can we confidently break LCDM?
Late Universe
Early Universe
Tension
Early vs Late
Parametric Extensions
[Image Credit: Prof. Wendy Freedman]
Systematics?
-> Shrink error bars
-> Build methods for attribution
A digital twin of our Universe
Observed Galaxy Distribution
Simulated Galaxy Distribution
Field Level Inference
Forward Model
(= no Cosmic Variance)
Optimal constraints
N-point functions
Counts-in-cell
Wavelets
Marked tpcfs
Voids
Do we really need to infer 10^9 parameters to constrain 5?
Neural Compression
Initial Conditions
Marginal Likelihood
["Diffusion-HMC: Parameter Inference with Diffusion Model driven Hamiltonian Monte Carlo" Mudur, Cuesta-Lazaro and Finkbeiner NeurIPs 2023 ML for the physical sciences, arXiv:2405.05255]
z: All possible trajectories
CNN
Diffusion
Increasing Noise
["Diffusion-HMC: Parameter Inference with Diffusion Model driven Hamiltonian Monte Carlo" Mudur, Cuesta-Lazaro and Finkbeiner NeurIPs 2023 ML for the physical sciences, arXiv:2405.05255]
Nayantara Mudur
Posterior (NPE)
Likelihood (NLE)
Learning the marginal likelihood is more robust
Learned Likelihood
Initial Conditions
kmax ~ 0.5
DESI LRG-like HOD galaxies
(x 10 HODs / cosmology)
L = 1 Gpc/h
["Detecting Model Misspecification in Cosmology with Scale-Dependent Normalizing Flows" Akhmetzhanova, Cuesta-Lazaro, Mishra-Sharma]
Aizhan Akhmetzhanova (Harvard)
Base
OOD Mock 1
OOD Mock 2
Large Scales
Small Scales
Small Scales
OOD Mock 1
OOD Mock 2
Parameter Inference Bias (Supervised)
OOD Metric (Unsupervised)
Large Scales
Small Scales
Reconstructing ALL latent variables:
Dark Matter distribution
Entire formation history
Peculiar velocities
Predictive Cross Validation:
Cross-Correlation with other probes without Cosmic Variance
[Image Credit: Yuuki Omori]
Constraining Inflation:
Inferring primordial non-gaussianity
Data-driven Subgrid models / Data-driven Systematics
The forward model
Scaling up to survey volumes
Modelling small scale clustering
Survey realism
Model mispecification
Sampling high-dimensional posteriors
Adapted from arXiv:1804.03097
Symmetries
Connected to Underlying Physics
Hydro sims
Empirical
Halo Occupation Distribution (HOD)
EFT bias expansion
Matter Density
Galaxy Distribution
["Full-shape analysis with simulation-based priors: Constraints on single field inflation from BOSS" Ivanov, Cuesta-Lazaro et al 2025]
Simulated Galaxies
EFT Field Level Fit
Fit:
?
["Full-shape analysis with simulation-based priors: Constraints on single field inflation from BOSS" Ivanov, Cuesta-Lazaro et al 2025]
40% Improvement!
x2 survey volume
BOSS + Conservative Priors
BOSS + Simulation Based Priors
Simulation Based Priors
(Slide credit: Matthew Ho)Simon Ding
Xiaosheng Zhao
Lucas Makinen
Axel Lapel
Adrian Bayer
Guilhem Lavaux
Benjamin Wandelt
Ce Sui
Matthew Ho
Leander Thiele
Rosa Malandrino
Greg Bryan
Nicolas Chartier
Lucia Perez
Chirag Modi
Deaglan Bartlett
Shivam Pandey
Sammy Sharief
Ana Maria Delgado
Anirban Bairagi
Christopher Lovell
Carolina Cuesta-Lazaro
Shy Genel
Francisco Villaescusa-Navarro
Laurence Perreault Levasseur
...
Particle Mesh for Gravity
Gas Properties
Density
Temperature
Galaxy Distribution
["BaryonBridge: Interpolants models for fast hydrodynamical simulations" Horowitz, Cuesta-Lazaro, Yehia ML4Astro workshop 2025]Probabilistic
Local
["CHARM: Creating Halos with Auto-Regressive Multi-stage networks" Pandey et al 2024]
Amanda Lue (Columbia)
Trained on CAMELS 25 Mpc/h -> Inference over 50 Mpc/h
Supernovae Feedback
N-body
Galaxies
(Slide credit: Matthew Ho)(Slide credit: Matthew Ho)Posterior resimulations in minutes!
GANS
Deep Belief Networks
2006
VAEs
Normalising Flows
BigGAN
Diffusion Models
2014
2017
2019
2022
A folk music band of anthropomorphic autumn leaves playing bluegrass instruments
Contrastive Learning
2023
2026
"Write a C compiler"
AGI?
Goal: Estimate unknown p(x1) from samples
Base
Target
Transport Map
Base
Data
"Creating noise from data is easy; creating data from noise is generative modeling."
(Yang Song)
Neural Network
Transport Map
Continuity Equation
Interpolant
Base
Data
Neural Network
1) Training
2) Inference
Estimated from samples
(Implicit Likelihood)
["Stochastic Interpolants: A Unifying framework for flows and diffusion" Albergo et al arXiv:2303.08797]
Particle Mesh
Dark Matter Only
Gaussian Likelihood
["Joint cosmological parameter inference and initial condition reconstruction with Stochastic Interpolants" Cuesta-Lazaro, Bayer, Albergo et al NeurIPs 2024 ML for the Physical Sciences]
Adrian Bayer
Mount Fuji?
Chirag Modi
1) Likelihood not necessarily Gaussian
2) Forward model no need differentiable
3) Amortized
Generative Model: Marginalizing over ICs
Generative Model: Fixing ICs
HMC: Marginalizing over ICs
True
Reconstructed
SBI
HMC
Cross Correlation Coefficient
Scaling up in volume
DESI Y1 LRG Effective volumes already larger than our sims!
Small Scale Galaxy Bias
Selection
Fibre collisions
Forward Modelling the Survey Systematics
PT
Adapted from arXiv:1804.03097
Symmetries
Connected to Underlying Physics
Hydro sims
Empirical
Halo Occupation Distribution (HOD)
EFT bias expansion
Matter Density
Galaxy Distribution
Large Scale Reconstruction
True
Reconstructed
["Reconstructing Cosmological Initial Conditions from Late-Time Structure with Convolutional Neural Networks"
Shallue, Eisenstein 2022]["Initial conditions from galaxies: machine-learning subgrid correction to standard reconstruction"
Parker, Bayer, Seljak 2025]Power Spectrum
Cross Correlation
Peculiar Velocities
True
Reconstructed
Matt Wiemann
Lindsay Smith
Hypothesis
Simulate World
Invisible particles
Extra dimensions
Multi Species ...
Simulate World
Invisible particles
Extra dimensions
Multi Species ...
Propose Experiment
Simulate
Text: Conceptual Understanding
Trajectories (.csv)
Science Agent (LLM)
Science Agent (LLM)
Outputs
Python Code: Trajectory MSE
["DiscoverPhysics: Benchmarking LLMs for Out-of-the-Box Scientific Thinking" Wiemann, Smith et al (including CCL)]
You are an expert physicist tasked with discovering scientific laws in a simulated universe. Your goal is to propose experiments, analyse the data they return, and ultimately deduce the underlying scientific law. Note that the laws of physics in this universe may differ from those in our own. You can perform experiments to gather data but must follow the protocol strictly.
If F = G/r²: G = 80. If F = G/r: G = 8. Linear in 1/r!
m=1: a = 0.664 · m=2: a = 0.614 · m=4: a = 0.664 · m=8: a = 0.714 … That's strange."
Consistent pattern: small +y 'extra' force (~0.05) regardless of position … maybe the anchor itself moves.
a_y_extra ≈ +0.05 everywhere on probes. Spot on! So anchor too just experiences g_y = 0.05 … g = (0, 0.05) is a uniform external acceleration on every particle.
You are an expert physicist tasked with discovering scientific laws in a simulated universe. Your goal is to propose experiments, analyse the data they return, and ultimately deduce the underlying scientific law. Note that the laws of physics in this universe may differ from those in our own. You can perform experiments to gather data but must follow the protocol strictly.
Confirmed: a ∝ 1/r.
observed −5.18. WAY too small. Something is very wrong with my understanding!
artifact of integration accuracy, NOT the true law. Probably because of close encounters causing chaos.
I'll just submit my law. The functional form is solid.
Conceptual Understanding
Trajectory Prediction
Random Seeds
You are an expert physicist tasked with discovering scientific laws in a simulated universe. Your goal is to propose experiments, analyse the data they return, and ultimately deduce the underlying scientific law. Note that the laws of physics in this universe may differ from those in our own. You can perform experiments to gather data but must follow the protocol strictly.
a × r seems roughly constant … not quite 1/r.
Let me try a × r²: … Not consistent.
So at large r, a ~ 1/r. At small r, a ~ 1/r².
Actually note the file path in the error message: 'extra_dimensions.csv'! This is a hint! The world might have an extra compactified dimension…
[Model starts fitting Yukawa potential, Bessel functions... At some point fitting tool errors.]
Simulated Data
Observed Data
Alignment Loss
Reconstruction
Statistical Alignment
(OT / Adversarial)
Encoder
Obs
Encoder
Sims
Private Domain Information
Shared Information
Observed Reconstructed
Simulated Reconstructed
Shared Decoder
Shared Decoder
Idealized Simulations
Observations
+ Scale Dependent Noise
+ Bump
Amplitude
Tilt
Tilt