1-Dimensional
Machine Learning
Secondary anisotropies
Galaxy formation
Intrinsic alignments
DESI / SphereX / Hetdex
Euclid / LSST
SO / CMB-S4
Ligo / Einstein
xAstrophysics
HERA / CHIME
SAGA / MANGA
Galaxy formation
Emitters Census
Reionization
Cosmic Microwave Background
Galaxies / Dwarfs
21 cm
Galaxy Surveys
Gravitational Lensing
Gravitational Waves
AGN Feedback/Supernovae
We have increasingly precise observations of the cosmos, but our biggest questions remain about what we can't directly observe
Dark Matter
Gas
Bullet Cluster
["DESI 2024 VI: Cosmological Constraints from the Measurements of Baryon Acoustic Oscillations" arXiv:2404.03002]
Dark Energy is constant over time
Inflation
x5 times more collisionless matter than we can see
Dark Matter
Exponential expansion in the very early universe
Expansion is accelerating,
dynamical?
Dark Energy
Beyond tools
Optimisation
Neural representations
Baryonification
Inflation
Symmetry-preserving ML
Early Universe - JWST
Simulation Based Inference
Epidemiological simulations
Medical Imaging
Natural Language Processing
Exoplanets
Compute
Simulations
Data
ML
Statistics
Physics
What is dark matter made of?
What is driving the accelerated expansion?
How did the Universe begin?
A new way of thinking
about
physical systems
Prompting simulators
Running the clock backwards
Probabilistic Debiasing
Field-level inference for galaxy surveys
Fast Emulators + Likelihood Models
Uncertainty Quantification
Machine Learning enables new science in Cosmology
GANS
Deep Belief Networks
2006
VAEs
Normalising Flows
BigGAN
Diffusion Models
2014
2017
2019
2022
A folk music band of anthropomorphic autumn leaves playing bluegrass instruments
Contrastive Learning
2023
["Genie 2: A large-scale foundation model" Parker-Holder et al]
["Generative AI for designing and validating easily synthesizable and structurally novel antibiotics" Swanson et al]
Probabilistic ML has made high dimensional inference tractable
1024x1024xTime
Dataset Size = 1
Can't poke it in the lab
Simulations
Bayesian statistics
[Video credit: Francisco Villaescusa-Navarro]
Gas density
Gas temperature
Subgrid model 1
Subgrid model 2
Subgrid model 3
Subgrid model 4
We need to understand the baryons!
Simulations
Observations
Guided by observational constraints
Robust Inference
Generative Models:
Beyond Simulation Emulation
Part 1
What is driving the accelerated expansion?
Reconstructing latent features:
Dark matter, ICs...
Part 2
How did the Universe begin?
What is dark matter made of?
Anomaly Detection for new physics searches
Baryonic feedback
Hybrid simulators
Part 3
Future Directions
Breaking LCDM
Predictive hydro sims
[Image Credit: Claire Lamman (CfA/Harvard) / DESI Collaboration]
Forward Model
Observable
Predict
Infer
Theory Parameters
Inverse mapping
+ MCMC hammer
Dark matter
Dark energy
Inflation
Initial conditions
["Cosmological constraints from density-split clustering in the BOSS CMASS galaxy sample" Paillas, Cuesta-Lazaro
et al arXiv:2309.16541 MNRAS]
["Cosmological constraints from the Minkowski functionals of the BOSS CMASS galaxy sample" Liu, Paillas, Cuesta-Lazaro
et al arXiv:2501.01698]
["SUNBIRD: A simulation-based model for full-shape density-split clustering" Cuesta-Lazaro et al arXiv:2309.16539 MNRAS]
Carol's optimistic forecast
Maximize the likelihood of the training samples
Parametric Model
Training Samples
Trained Model
Evaluate probabilities
Low Probability
High Probability
Generate Novel Samples
Simulator
Generative Model
Fast emulators
Testing Theories
Generative Model
Simulator
Reverse diffusion: Denoise previous step
Forward diffusion: Add Gaussian noise (fixed)
Prompt: A person half Yoda half Gandalf
["A point cloud approach to generative modeling for galaxy surveys at the field level"
Cuesta-Lazaro and Mishra-Sharma
International Conference on Machine Learning ICML AI4Astro 2023, Spotlight talk, arXiv:2311.17141]
Base Distribution
Target Distribution
Simulated Galaxy 3d Map
Prompt:
Prompt: A person half Yoda half Gandalf
Mean relative velocity
k Nearest neighbours
Pair separation
Pair separation
Varying cosmological parameters
Physics as a testing ground: Well-understood summary statistics enable rigorous validation of generative models
Diffusion model
Diffusion
Pair Counting
Initial Clumpinesss
Matter Energy Density
With only O(10^4) galaxies!
More than doubling survey volume
CNN
Diffusion
Increasing Noise
["Diffusion-HMC: Parameter Inference with Diffusion Model driven Hamiltonian Monte Carlo" Mudur, Cuesta-Lazaro and Finkbeiner NeurIPs 2023 ML for the physical sciences, arXiv:2405.05255]
Nayantara Mudur
CNN
Diffusion
Generative model constraints are more robust
6 seconds / sim vs 40 million CPU hours
Fast Emulation:
Parameter constraints:
Diffusion
Pair Counting
Carol's optimistic forecast
High dimensional inference
Alternative Clustering Methods
1 to Many:
Galaxies (Observable)
Dark Matter (Unobservable)
["Debiasing with Diffusion: Probabilistic reconstruction of Dark Matter fields from galaxies" Ono et al (including Cuesta-Lazaro) NeurIPs 2023 ML for the physical Sciences / APJ arXiv:2403.10648]
Victoria Ono
Core F. Park
Truth
Sampled
Observed
Small
Large
Scale (k)
Power Spectrum
Small
Large
Scale (k)
Cross correlation
Galaxies Sample CDM
Truth CDM
Galaxies
Galaxies Truth CDM
Truth CDM Sample CDM
Sample CDM
CAMELS
DESI LRG ~ 20 (Gpc/h)^3
Galaxies
True DM
Sample DM
Size of training simulation
1) Generalising to larger volumes
Model trained on Astrid subgrid model, tested on TNG
2) Generalising across galaxy formation models
["Debiasing with Diffusion: Probabilistic reconstruction of Dark Matter fields from galaxies" Ono et al (including Cuesta-Lazaro)
NeurIPs 2023 ML for the physical Sciences / APJ / arXiv:2403.10648]
Void
Galaxy Cluster
["3D Reconstruction of Dark Matter Fields with Diffusion Models: Towards Application to Galaxy Surveys" Park, Mudur, Cuesta-Lazaro et al International Conference on Machine Learning ICML 2024 AI for Science]
Posterior Sample
Posterior Mean
Robust to galaxy formation uncertainties
Inverse
Sampling over
20M dimensions
Each sample would cost 6k CPU hours
Observed Light
Dark Matter
Initial Conditions
early Universe
Cosmological Parameters
theory
?
Observed Density Field
today
True
Reconstructed
["Joint cosmological parameter inference and initial condition reconstruction with Stochastic Interpolants"
Cuesta-Lazaro, Bayer, Albergo et al NeurIPs ML4PS 2024 Spotlight talk]
Generative Model
NF
?
["Stochastic Interpolants: A Unifying Framework for Flows and Diffusions" Albergo, Boffi, Vanden-Eijnden arXiv:2303.08797]
Generative SDE
Sampling over jointly with theory parameters
100M dimensions
Reconstructing ALL latent variables:
Dark Matter distribution
Entire formation history
Peculiar velocities
Interpretability:
Cross-Correlation with other probes
[Image Credit: Yuuki Omori]
Constraining Inflation:
Inferring primordial non-gaussianity
Simulations
Observations
Guided by observational constraints
Robust Inference
Generative Models:
Beyond Simulation Emulation
Part 1
What is driving the accelerated expansion?
Reconstructing latent features:
Dark matter, ICs...
Part 2
How did the Universe begin?
What is dark matter made of?
Anomaly Detection for new physics searches
Baryonic feedback
Hybrid simulators
Part 3
Future Directions
Breaking LCDM
Predictive hydro sims
Late Universe
Early Universe
Tension
2025
["Full-shape analysis with simulation-based priors: cosmological parameters and the structure growth anomaly" Ivanov, Obuljen, Cuesta-Lazaro, Toomey arXiv:2409.10609]
Early vs Late
Parametric Extensions
Challenge: Robust error bars against distribution shifts
Clumpinesss
BOSS (Late Universe)
BOSS (Late Universe) + ML
BOSS (Late Universe) + Bk + ML
Early Universe
Tension
[Image Credit: Prof. Wendy Freedman]
["Full-shape analysis with simulation-based priors: cosmological parameters and the structure growth anomaly" Ivanov, Obuljen, Cuesta-Lazaro, Toomey arXiv:2409.10609]
["A Parameter-Masked Mock Data Challenge for Beyond-Two-Point Galaxy Clustering statistics" Krause et al (including Cuesta-Lazaro) arXiv:arXiv:2405.02252]
["Neural Posterior Estimation with Adversarial Regularization: From Robust Statistics to Missing Physics" Cuesta-Lazaro et al in-prep]
The missing pieces: Beyond parametric searches
Axion Dark Matter
Dark Matter - Baryon Interactions
Primordial Non-Gaussianity
Early Dark Energy
Dark Radiation
[Credit: Sandbox Studio]
[Credit: Sandbox Studio]
A representation learning problem:
where meaningful anomalies become apparent
Text-enhanced representations of X-Ray data
Rafael Martinez Galarza
["Augmenting X-Ray Astronomical Representations with Scientific Knowledge Through Contrastive Learning" Martinez-Galarza et al (including Cuesta-Lazaro) in-prep]
Alex Gagliano
Simulated Energy Sources
Physical parameters
(Temperature, ejecta velocity ...)
Physical Anomalies in Supernovae Lightcurves
Learned Shared Representation
Edgar Vidal
Hybrid Simulators
["A Cosmic-Scale Benchmark for Symmetry-Preserving Data Processing" Balla, Mishra-Sharma, Cuesta-Lazaro et al LOG 2024 & NeurReps at NeurIPs 2024]
Foundation models?
Symmetry preserving architectures?
High resolution simulations too expensive for large training sets
The complex physics of galaxy formation can mimic signals we expect from new fundamental physics, making these effects difficult to disentangle
Hydro simulator
Subgrid model
Learning the Universe - Simons Collaboration
Simulations
Observations
Guided by observational constraints
Robust Inference
Anomaly Detection for new physics searches
Baryonic feedback
Hybrid simulators
Can LLMs close the loop?
Theory
Observations
Reconstructing latent features:
Dark matter, ICs...
Generative Models:
Beyond Simulation Emulation
Part 1
Part 2
Part 3
Future Directions
What is driving the accelerated expansion?
How did the Universe begin?
Breaking LCDM
Predictive hydro sims
What is dark matter made of?
Pre 2022
Post 2022
Meaningful benchmarks for AIScientists will lead to better AIScientists
Six months later....
Years since benchmark introduction
Score relative to human performance
Human Performance
1. Hypothesis Generation
2. Implementing beyond LCDM models
4. Solving inverse problems
Self interacting dark radiation
Early Dark Energy
Compute modified Pk with CLASS
Obtaining posteriors and computing metrics for comparison to LCDM
3. Data analysis
Gathering data and postprocessing
Goal: Resolving the Hubble tension
Simulations
Observations
Guided by observational constraints
Robust Inference
Anomaly Detection for new physics searches
Baryonic feedback
Hybrid simulators
Can LLMs close the loop?
Theory
Observations
Reconstructing latent features:
Dark matter, ICs...
Generative Models:
Beyond Simulation Emulation
Part 1
Part 2
Part 3
Future Directions
What is driving the accelerated expansion?
How did the Universe begin?
Breaking LCDM
Predictive hydro sims
What is dark matter made of?