Machine Learning solutions for Cosmic Problems
IAIFI Fellow, MIT / Center for Astrophysics
Carolina Cuesta-Lazaro
Symmetries, Feedback and Controllable simulations
1-Dimensional
Machine Learning
Secondary anisotropies
Galaxy formation
Intrinsic alignments
DESI, DESI-II, Spec-S5
Euclid / LSST
Simons Observatory
CMB-S4
Ligo
Einstein
The era of Big Data Cosmology
xAstrophysics
5-Dimensional
Carolina Cuesta-Lazaro IAIFI/MIT @ Flatiron 2024
Carolina Cuesta-Lazaro IAIFI/MIT @ Flatiron 2024
Generative models as Fast Emulators + Likelihood Estimators
Reconstructing Dark Matter
["A point cloud approach to generative modeling for galaxy surveys at the field level"
Cuesta-Lazaro and Mishra-Sharma
ICML AI4Astro 2023, arXiv:2311.17141]
["3D Reconstruction of Dark Matter Fields with Diffusion Models: Towards Application to Galaxy Surveys"
Park, Mudur, Cuesta-Lazaro et al ICML 2024 AI for Science]
Astrophysics dominates Simulation-based Inference
Carolina Cuesta-Lazaro IAIFI/MIT @ Flatiron 2024
on Simulations
Aizhan Akhmetzhanova
Helena Brittain
DM Halo masses
Anomaly Detection
The cost of cosmological simulations
AbacusSummit
330 billion particles in 2 Gpc/h volume
60 trillion particles
~ 8TBs per simulation
15M CPU hours
(TNG50 ~100M cpu hours)
ML Requirements
Carolina Cuesta-Lazaro IAIFI/MIT @ Flatiron 2024
See Benjamin Wandelt, Matt Ho talks
Symmetry preserving architectures
Defining a continuous latent space for feedback
Prompting simulators
Controllable
Simulators
Continuous fields
Inductive Biases
Learning to represent feedback
Carolina Cuesta-Lazaro IAIFI/MIT @ Flatiron 2024
["A Cosmic-Scale Benchmark for Symmetry-Preserving Data Processing"
Balla, Mishra-Sharma, Cuesta-Lazaro et al
NeurIPs 2024 NuerReps arXiv:2410.20516]
Symmetry-preserving ML
Carolina Cuesta-Lazaro IAIFI/MIT @ Flatiron 2024
E(3) Equivariant architectures
Benchmark models
["Geometric and Physical Quantities Improve E(3) Equivariant Message Passing" Brandstetter et al arXiv:2110.02905]
The bitter lesson by Rich Sutton
The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. [...]
methods that continue to scale with increased computation even as the available computation becomes very great. [...]
We want AI agents that can discover like we can, not which contain what we have discovered.
Carolina Cuesta-Lazaro IAIFI/MIT @ Flatiron 2024
["A Cosmic-Scale Benchmark for Symmetry-Preserving Data Processing"
Balla, Mishra-Sharma, Cuesta-Lazaro et al
NeurIPs 2024 NuerReps arXiv:2410.20516]
Carolina Cuesta-Lazaro IAIFI/MIT @ Flatiron 2024
(MSE errors in units of 1e-3)
No Equivariance
Equivariant
Memory scaling point clouds and voxels
Graph
Nodes
Edges
3D Mesh
Voxels
Both data representations scale badly with increasing resolution
Carolina Cuesta-Lazaro IAIFI/MIT @ Flatiron 2024
Representing Continuous Fields
Continuous in space and time
x500 Compression?
Can we store a simulation inside a neural network?
Carolina Cuesta-Lazaro IAIFI/MIT @ Flatiron 2024
Simulator 1
Simulator 2
Dark Matter
Feedback
Approach 1:
Contrastive
Learning the feedback manifold
Baryonic fields
Carolina Cuesta-Lazaro IAIFI/MIT @ Flatiron 2024
Approach 2:
Generative
z = Cosmological Mutual Information
z = Baryonic feedback
Representation Learning
Informative abstractions of the data
Carolina Cuesta-Lazaro IAIFI/MIT @ Flatiron 2024
Representation Learning a la gradient descent
Contrastive
Generative
inductive biases
from scratch or from partial observations
Students at MIT are
OVER-CAFFEINATED
NERDS
SMART
ATHLETIC
Carolina Cuesta-Lazaro IAIFI/MIT @ Flatiron 2024
Baryonic fields
Dark Matter
Generative model
Total matter, gas temperature,
gas pressure, gas metalicity
Encoder
Carolina Cuesta-Lazaro IAIFI/MIT @ Flatiron 2024
X-Ray
Cluster gas mass fractions
Cluster gas density profiles
Sunyaev-Zeldovich
Galaxy Properties
Thermal Integrated electron pressure (hot electrons / big objects)
Star formation + histories
Stellar mass / halo mass relation
Multi-wavelength observations
FRBs
Integrated electron density
Kinetic Integrated electron density x peculiar velocity
Carolina Cuesta-Lazaro IAIFI/MIT @ Flatiron 2024
Boryana Hadzhiyska, Alexandra Amon
Daisuke Nagai, Erwin Tin-Hay Lau
Isa Medlock
Chris Lowell?
Reconstructing dark matter back in time
Stochastic Interpolants
NF
["Joint cosmological parameter inference and initial condition reconstruction with Stochastic Interpolants" Cuesta-Lazaro, Bayer, Albergo et al NeurIPs 2024 ML for the Physical Sciences]
Adrian Bayer
Mount Fuji?
Chirag Modi
Continuous time Normalizing Flows
Continuity Equation
Loss requires solving an ODE!
Diffusion, Flow matching, Interpolants... All ways to avoid this at training time
[Image Credit: "Understanding Deep Learning" Simon J.D. Prince]
Carolina Cuesta-Lazaro IAIFI/MIT @ Flatiron 2024
Data
Base
Can we regress the velocity field directly?
Turned maximum likelihood into a regression problem!
Interpolant
Stochastic Interpolant
Expectation over all possible paths that go through xt
["Stochastic Interpolants: A Unifying framework for flows and diffusion" Albergo et al arXiv:2303.08797]
Carolina Cuesta-Lazaro IAIFI/MIT @ Flatiron 2024
?
["Probabilistic Forecasting with Stochastic Interpolants and Foellmer Processes" Chen et al arXiv:2403.10648 (Figure adapted from arXiv:2407.21097)]
Generative SDE
Carolina Cuesta-Lazaro IAIFI/MIT @ Flatiron 2024
Carolina Cuesta-Lazaro IAIFI/MIT @ Flatiron 2024
Carolina Cuesta-Lazaro IAIFI/MIT @ Flatiron 2024
Cross-Correlation True ICs | Pred ICs
Guided simulations with fuzzy constraints
Simulate what you need
(and sometimes what you want)
Carolina Cuesta-Lazaro IAIFI/MIT @ Flatiron 2024
2. A general representation for baryonic feedback may inform galaxy formation modelling
Conclusions
Can we constrain it through multi-wavelength observations?
1. Inductive biases in cosmology can greatly improve constraints
Symmetry preserving architectures: + constraining power and simulation efficiency
Continuous fields: memory efficient data representations and compression methods
3. Cosmological field level inference can be made efficient with generative models
Can generally make simulators more controllable!
Carolina Cuesta-Lazaro IAIFI/MIT @ Flatiron 2024
Copy of Flatiron2024
By carol cuesta
Copy of Flatiron2024
- 15