Generative Solutions for Cosmic Problems
Flatiron Institute
Institute for Advanced Studies
Carol(ina) Cuesta-Lazaro




1-Dimensional


Machine Learning
Secondary anisotropies
Galaxy formation
Intrinsic alignments



DESI / SphereX
Euclid / LSST
SO / CMB-S4
Ligo / Einstein


The era of Big Data Cosmology
xAstrophysics
HERA / CHIME
SAGA / MANGA




Galaxy formation
Emitters Census
Reionization


Cosmic Microwave Background
Galaxies / Dwarfs
21 cm
Galaxy Surveys
Gravitational Lensing
Gravitational Waves
AGN Feedback/Supernovae


Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT
We have increasingly precise observations of the cosmos, but our biggest questions remain about what we can't directly observe


Dark Matter
Gas
Bullet Cluster
["DESI 2024 VI: Cosmological Constraints from the Measurements of Baryon Acoustic Oscillations" arXiv:2404.03002]

Dark Energy is constant over time
Inflation
x5 times more collisionless matter than we can see
Dark Matter
Exponential expansion in the very early universe
Expansion is accelerating,
dynamical?
Dark Energy
Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT
["Genie 2: A large-scale foundation model" Parker-Holder et al (2024)]

["Generative AI for designing and validating easily synthesizable and structurally novel antibiotics" Swanson et al]
Probabilistic ML has made high dimensional inference tractable
1024x1024xTime
["Genie 3: A new frontier for world models" Parker-Holder et al (2025)]
Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT
Why Now?
Beyond tools
Optimisation
Neural representations
Baryonification

Inflation

Symmetry-preserving ML

Early Universe - JWST

Simulation Based Inference
Epidemiological simulations


Medical Imaging
Natural Language Processing

Exoplanets
Compute
Simulations
Data
ML
Statistics
Physics
What is dark matter made of?
What is driving the accelerated expansion?
How did the Universe begin?
A new way of thinking
about
physical systems

Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT
Understanding the Early Universe
Hypothesis Generation
Running the clock backwards
Field-level inference for galaxy surveys
Fast Emulators + Likelihood Models

Machine Learning enables new science in Cosmology

Understanding the Early Universe

Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT
Forward Model
Observable
Predict
Infer
Theory Parameters
Inverse mapping


+ MCMC hammer

Dark matter
Dark energy
Inflation
Initial conditions



Carol's optimistic forecast
Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT
Field Level inference
A digital twin of our Universe

Observed Galaxy Distribution
Simulated Galaxy Distribution

Field Level Inference
Forward Model
(= no Cosmic Variance)




Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT
Why field-level inference?
Optimal constraints
Counts-in-cell
Do we really need to infer 10^9 parameters to constrain ~10?

Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT


Compression
Marginal Likelihood
Explicit Likelihood
Implicit Likelihood
Initial Conditions
Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT
Generative Models 101
Maximize the likelihood of the training samples
Parametric Model


Training Samples
Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT
Trained Model

Evaluate probabilities


Low Probability
High Probability

Generate Novel Samples


Simulator
Generative Model
Fast emulators
Inference
Generative Model
Simulator
Generative Models: Simulate and Analyze
Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT
Bridging two distributions

Base
Data
"Creating noise from data is easy;
creating data from noise is generative modeling."
Yang Song
Neural Network
6 seconds / sim vs 40 million CPU hours
Fast Emulation





Density Fields
Marginal Likelihoods:
arXiv:2405.05255

Point Clouds
arXiv:2311.17141
Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT

Marginal Posteriors:

1) Sampling the Neural Likelihood (NLE) with HMC
2) Directly an optimal compression: Neural Posterior (NPE)
Learned Likelihood
Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT

CNN
Diffusion
Increasing Noise
["Diffusion-HMC: Parameter Inference with Diffusion Model driven Hamiltonian Monte Carlo" Mudur, Cuesta-Lazaro and Finkbeiner NeurIPs 2023 ML for the physical sciences, arXiv:2405.05255]

Nayantara Mudur


Posterior (NPE)
Likelihood (NLE)
Learning the marginal likelihood is more robust
Learned Likelihood
Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT






Reconstructing ALL latent variables:
Dark Matter distribution
Entire formation history
Peculiar velocities
Predictive Cross Validation:
Cross-Correlation with other probes without Cosmic Variance

[Image Credit: Yuuki Omori]
Constraining Inflation:
Inferring primordial non-gaussianity
Why field-level inference?
Data-driven Subgrid models / Data-driven Systematics
Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT
"Joint cosmological parameter inference and initial condition reconstruction with Stochastic Interpolants"
Cuesta-Lazaro, Bayer, Albergo et al
NeurIPs ML4PS 2024 Spotlight talk

Particle Mesh
Dark Matter Only
Gaussian Likelihood
Explicit Sampling vs SBI
Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT

1) Likelihood not necessarily Gaussian
2) Forward model no need differentiable
3) Amortized
Generative Model: Marginalizing over ICs
Generative Model: Fixing ICs
HMC: Marginalizing over ICs
Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT

True
Reconstructed


Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT
Scaling up in volume
Implicit FLI for DESI
DESI Y1 LRG Effective volumes already larger than our sims!
Small Scale Galaxy Bias

How galaxies are selected
Fibre collisions
Forward Modelling the Survey Systematics



EFT
Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT
Galaxy Formation

Self-Consistent Predictions across observables
arXiv:1804.03097
Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT

X-Ray
Cluster gas mass fractions
Cluster gas density profiles
Sunyaev-Zeldovich
Galaxy Properties
Thermal Integrated electron pressure (hot electrons / big objects)
Star formation + histories
Stellar mass / halo mass relation
FRBs
Integrated electron density

Kinetic Integrated electron density x peculiar velocity
Multi-wavelength Observables
Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT
["BaryonBridge: Interpolants models for fast hydrodynamical simulations" Horowitz, Cuesta-Lazaro, Yehia ML4Astro workshop 2025]

Particle Mesh for Gravity
CAMELS Volumes
1000 boxes with varying cosmology and feedback models

Gas Properties

Current model optimised for Lyman Alpha forest
7 GPU minutes for a 50 Mpc simulation
130 million CPU core hours for TNG50
Density
Temperature
Galaxy Distribution

Hydro At Scale
Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT
[Video credit: Francisco Villaescusa-Navarro]
Gas density
Gas temperature
Subgrid model 1
Subgrid model 2
Subgrid model 3
Subgrid model 4
Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT
Can we learn a general and continuous representation of Baryonic feedback?

Gas
Galaxies




Dark Matter
Baryonic fields
Marginalize over a broader set of subgrid physics
Interpolate between simulators
Mingshau Liu
(Ming)

Constrain z via multi-wavelength observations
Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT

Trained on:
TNG, SIMBA, Astrid, EAGLE
Encoder



1) Encoder

Gas
Galaxies




Dark Matter
Baryonic fields
2) Probabilistic Decoder
Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT



Dark Matter
Baryonic fields
(Unseen)
Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT
Generalizing to unseen simulations: Magneticum



Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT

Gas Density
Temperature
Astrid
EAGLE
Interpolating over Simulations
Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT
Observation
Question
Hypothesis
Testable Predictions
Gather data
Alter, Expand, Reject Hypothesis
Develop General Theories
[Figure adapted from ArchonMagnus]
High-dimensional data
Simulators as theory models
The Scientific Method in 2025
Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT
["DESI 2024 VI: Cosmological Constraints from the Measurements of Baryon Acoustic Oscillations" arXiv:2404.03002]

Dark Energy is constant over time
Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT
["An LLM-driven framework for cosmological
model-building and exploration" Mudur, Cuesta-Lazaro, Toomey (in prep)]

Can LLMs explore the space of hypothesis?
Propose a model for Dark Energy
Implement it in a Cosmology simulation code: CLASS
Test fit to DESI Observations
Iterate to improve fit
Quintessence, DE/DM interactions....
Must pass a set of general tests for "reasonable" models
Ideally, compare evidence to LCDM.
For now, Bayesian Information Criteria (BIC)
1
2

Nayantara Mudur (Harvard)
Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT
Can LLMs implement new physics models?
Thawing Quintessence
Axion-like Early Dark Energy
Ultra-light scalar field that temporarily acts as dark energy in the early universe
Implementation Challenge:
Dynamic dark energy model: scalar field transitions from "frozen" (cosmological constant-like) to evolving as the universe expands.
Oscillatory behaviour
Can take advantage of existing scalar field implementations in CLASS
+ 43,000 lines of C code
+ 10,000 lines of numerical files
CLASS Challenge:
Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT

1) Code compiles + obtains reasonable observables
2) Implementation agrees with target repository
3) Goodness of fit for DESI + Supernovae
4) H0 tension metrics
Curated
1 page long description of model to be implemented, CLASS tips + very explicit units
Paper
Directly from a full paper
If fails, get feedback from another LLM
Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT
Propose a Dark Energy Model

Shortcut: field that produces this?
Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT

Propose a Dark Energy Model
Asked for physical motivation. It tried :(
Not true, preferred scale
Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT
Reinforcement Learning
How to iterate
Update the base model weights to optimize a scalar reward (s)

DeepSeek R1
Base LLM
(being updated)
What rewards are more advantageous?
Base LLM
(frozen)
Develop basic skills: numerics, theoretical physics, UNIT CONVERSION
Community Effort!
Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT
Evolutionary algorithms
Learning in natural language, reflect on traces and results
Examples: EvoPrompt, FunSearch,AlphaEvolve

How to iterate
Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT
["GEPA: Reflective prompt evolution can outperform reinforcement learning" Agrawal et al]

GEPA: Evolutionary
GRPO: RL
+10% improvement over RL with x35 less rollouts
Scientific reasoning with LLMs still in its infancy!
Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT
2. We can scale hydrodynamical simulation in volume for the analysis of LSS surveys
Conclusions

Can we leverage multi-wavelength observations?
1. Cosmological field level inference can be made scalable with generative models
Can EFT help us scale in volume?

Can generally make simulators more controllable!
Is resolution too low?
Carolina Cuesta-Lazaro Flatiron/IAS - Liverpool CDT

3. What role can LLMs play in Science?

Looking for PhD students and Postdocs interested in AIxAstro
carolina.clzr@gmail.com

CDT-Liverpool-2025
By carol cuesta
CDT-Liverpool-2025
- 22