Redshift surveys in a nutshell
Learning summary statistics with ML
Carolina Cuesta-Lazaro
Newcastle Astro Journal Club
Collaborators: Cheng-Zong Ruan, Yosuke Kobayashi, Enrique Paillas, Alexander Eggemeier, Pauline Zarrouk, Sownak Bose, Takahiro Nishimichi, Baojiu Li, Carlton Baugh
Medical Imaging
Epidemiology: Agent Based simulations
OBSERVED
SIMULATED
Cosmology
Simulations
HPC
Science question
Statistics ML
Fifth forces modify structure growth
GROWTH
- GRAVITY
- FIFTH FORCE
+ EXPANSION
Credit: Cartoon depicting Willem de Sitter as Lambda from Algemeen Handelsblad (1930).
Credit: https://arxiv.org/abs/1912.09383
Resolving tensions
Early Universe
~linear
Gravity
Late Universe
Non-linear
Credit: S. Codis+16
Non-linearity = PT predictions inaccurate
Credit: S. Codis+16
Early Universe
~linear
Gravity
Late Universe
Non-linear
Credit: S. Codis+16
Non-Guassianity
Second moment not optimal
Machine Learning as a solution to
- Non-linearities Produce accurate predictions based on N-body simulations
- Non-Gaussianity Extract cosmological information at the field level
Cosmology =
Main Assumptions
- Galaxies don't impact dark matter clustering
- Number of galaxies depends on halo mass only
- We don't know the Initial Conditions
- Data is very high dimensional
- Large number of parameters to constrain
- N-body sims extremely slow to run! (Sampling parameter space > O(10^6) calls)
Cosmology =
Galaxy =
?
Summarise the data
N-body simulations
Likelihood evaluations
Credit: https://cs231n.github.io/convolutional-networks/
What to emulate?
- Flexibility: Vary galaxy tracers, and their cross-correlations. Marginalising over g requires flexible g!
-
1% accuracy1-sigma accuracy:- Emulator only as good as data used for training
- Model clustering and mapping between real and redshift space separately
Neural Net
Analytical
Cosmology =
Neural Network Emulator
1) Very fast -> MCMC
2) Halo-Galaxy mapping modelled very accurately
3) Allows for flexible implementations of Halo-Galaxy connection
4) Modelling RSD through the Streaming Model simplifies the functions the emulator needs to learn
Galaxy =
Cosmology
Centrals
Satellites
How much information are we throwing away by summarising in two piont functions?
How much information are we throwing away by summarising the data?
Density-dependent clustering
Clusters
Voids
0.08
0.05
0.02
0.7
0.4
PRELIMINARY
0.85
0.80
1.1
1.0
0.9
3.5
0.9
3.0
0.33
0.08
0.28
0.03
0.07
0.4
0.7
0.8
0.86
0.87
1.06
0.87
3.0
3.5
Input
x
Neural network
f
Representation
(Summary statistic)
r = f(x)
Output
o = g(r)
Increased interpretability through structured inputs
Modelling cross-correlations
ML and cosmology
- ML to accelerate non-linear predictions: allow MCMC sampling of non-linear scales
- Precision of future surveys: what and how we emulate will have an impact on cosmological constraints
- Can ML extract **all** the information that there is at the field-level in the non-linear regime?
- Compare data and simulations, point us to the missing pieces?
Copy of deck
By carol cuesta
Copy of deck
- 389