1-Dimensional
Machine Learning
Secondary anisotropies
Galaxy formation
Intrinsic alignments
DESI, DESI-II, Spec-S5
Euclid / LSST
Simons Observatory
CMB-S4
Ligo
Einstein
xAstrophysics
5-Dimensional
Generative models as Fast Emulators + Likelihood Estimators
Reconstructing Dark Matter
["A point cloud approach to generative modeling for galaxy surveys at the field level"
Cuesta-Lazaro and Mishra-Sharma
ICML AI4Astro 2023, arXiv:2311.17141]
["3D Reconstruction of Dark Matter Fields with Diffusion Models: Towards Application to Galaxy Surveys"
Park, Mudur, Cuesta-Lazaro et al ICML 2024 AI for Science]
Aizhan Akhmetzhanova
Helena Brittain
DM Halo masses
Anomaly Detection
AbacusSummit
330 billion particles in 2 Gpc/h volume
60 trillion particles
~ 8TBs per simulation
15M CPU hours
(TNG50 ~100M cpu hours)
ML Requirements
See Benjamin Wandelt, Matt Ho talks
Symmetry preserving architectures
Defining a continuous latent space for feedback
Prompting simulators
Controllable
Simulators
Continuous fields
Inductive Biases
Learning to represent feedback
["A Cosmic-Scale Benchmark for Symmetry-Preserving Data Processing"
Balla, Mishra-Sharma, Cuesta-Lazaro et al
NeurIPs 2024 NuerReps arXiv:2410.20516]
E(3) Equivariant architectures
Benchmark models
["Geometric and Physical Quantities Improve E(3) Equivariant Message Passing" Brandstetter et al arXiv:2110.02905]
The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. [...]
methods that continue to scale with increased computation even as the available computation becomes very great. [...]
We want AI agents that can discover like we can, not which contain what we have discovered.
["A Cosmic-Scale Benchmark for Symmetry-Preserving Data Processing"
Balla, Mishra-Sharma, Cuesta-Lazaro et al
NeurIPs 2024 NuerReps arXiv:2410.20516]
(MSE errors in units of 1e-3)
No Equivariance
Equivariant
Graph
Nodes
Edges
3D Mesh
Voxels
Both data representations scale badly with increasing resolution
Continuous in space and time
x500 Compression?
Can we store a simulation inside a neural network?
Simulator 1
Simulator 2
Dark Matter
Feedback
Approach 1:
Contrastive
Baryonic fields
Approach 2:
Generative
z = Cosmological Mutual Information
z = Baryonic feedback
Informative abstractions of the data
Contrastive
Generative
inductive biases
from scratch or from partial observations
Students at MIT are
OVER-CAFFEINATED
NERDS
SMART
ATHLETIC
Baryonic fields
Dark Matter
Generative model
Total matter, gas temperature,
gas pressure, gas metalicity
Encoder
X-Ray
Cluster gas mass fractions
Cluster gas density profiles
Sunyaev-Zeldovich
Galaxy Properties
Thermal Integrated electron pressure (hot electrons / big objects)
Star formation + histories
Stellar mass / halo mass relation
FRBs
Integrated electron density
Kinetic Integrated electron density x peculiar velocity
Boryana Hadzhiyska, Alexandra Amon
Daisuke Nagai, Erwin Tin-Hay Lau
Isa Medlock
Chris Lowell?
Stochastic Interpolants
NF
["Joint cosmological parameter inference and initial condition reconstruction with Stochastic Interpolants" Cuesta-Lazaro, Bayer, Albergo et al NeurIPs 2024 ML for the Physical Sciences]
Adrian Bayer
Mount Fuji?
Chirag Modi
Continuity Equation
Loss requires solving an ODE!
Diffusion, Flow matching, Interpolants... All ways to avoid this at training time
[Image Credit: "Understanding Deep Learning" Simon J.D. Prince]
Data
Base
Can we regress the velocity field directly?
Turned maximum likelihood into a regression problem!
Interpolant
Stochastic Interpolant
Expectation over all possible paths that go through xt
["Stochastic Interpolants: A Unifying framework for flows and diffusion" Albergo et al arXiv:2303.08797]
?
["Probabilistic Forecasting with Stochastic Interpolants and Foellmer Processes" Chen et al arXiv:2403.10648 (Figure adapted from arXiv:2407.21097)]
Generative SDE
Cross-Correlation True ICs | Pred ICs
Guided simulations with fuzzy constraints
2. A general representation for baryonic feedback may inform galaxy formation modelling
Can we constrain it through multi-wavelength observations?
1. Inductive biases in cosmology can greatly improve constraints
Symmetry preserving architectures: + constraining power and simulation efficiency
Continuous fields: memory efficient data representations and compression methods
3. Cosmological field level inference can be made efficient with generative models
Can generally make simulators more controllable!