Redshift surveys in a nutshell
Learning summary statistics with ML
Carolina Cuesta-Lazaro
19th January 2022 - Waterloo Astronomy Seminar
Collaborators: Cheng-Zong Ruan, Yosuke Kobayashi, Alexander Eggemeier, Pauline Zarrouk, Sownak Bose, Takahiro Nishimichi, Baojiu Li, Carlton Baugh
The golden days of Cosmology:
A five parameter Universe
Initial Conditions
Dynamics
Dark energy
Dark matter
Ordinary matter
Amplitude initial density field
Scale dependence
Linear
Credit: NASA / WMAP SCIENCE TEAM
GALAXY CLUSTERING
GRAVITATIONAL WAVES
GRAVITATIONAL LENSING
Early Universe
~linear
Gravity
Late Universe
Non-linear
Credit: S. Codis+16
Non-linearity = PT predictions inaccurate
Credit: S. Codis+16
Early Universe
~linear
Gravity
Late Universe
Non-linear
Credit: S. Codis+16
Non-Guassianity
Second moment not optimal
Machine Learning as a solution to
- Non-linearities Produce accurate predictions based on N-body simulations
- Non-Gaussianity Extract cosmological information at the field level
Space-time
geometetry
Energy content
Adding new degrees of freedom
- To the energy content (dynamic) DARK ENERGY
- To the way space-time geometry reacts to the energy content MODIFIED GRAVITY (FIFTH FORCES)
?
Fifth forces modify structure growth
GROWTH
- GRAVITY
- FIFTH FORCE
+ EXPANSION
Credit: Cartoon depicting Willem de Sitter as Lambda from Algemeen Handelsblad (1930).
Cosmology =
Main Assumptions
- Galaxies don't impact dark matter clustering
- Number of galaxies depends on halo mass only
- We don't know the Initial Conditions
- Data is very high dimensional
- Large number of parameters to constrain
- N-body sims extremely slow to run! (Sampling parameter space > O(10^6) calls)
Cosmology =
Galaxy =
?
Summarise the data
N-body simulations
How to emulate?
Credit: James Hensman
Credit: James Hensman
Optimize the marginal likelihood: Analytical solution!
Pros
- Easy to get going
- Small number of free parameters
Cons
- Scales badly with training set size O(n^3)
- Scales badly with number of input features
Credit: https://cs231n.github.io/convolutional-networks/
Loss Value
Weights
Weights
+
+
+
+
Network A
Network B
Pros
- Fast, does not scale with n
- Can model large input features
Cons
- Prone to overfitting: But ways to avoid it
- "Harder" to train (requires more exploration)
What to emulate?
- Flexibility: Vary galaxy tracers, and their cross-correlations. Marginalising over g requires flexible g!
-
1% accuracy1-sigma accuracy:- Emulator only as good as data used for training
- Simplify input/output relation through physical models
Neural Net
Analytical
The Streaming Model
PAIRWISE VELOCITY
DISTRIBUTION
Probability of finding a pair of galaxies at distance r
Virial motions within halos
Infall towards halos
On large scales,
slowly varying function of
n = 4 reproduces clustering down to small scales
INFALL
OUTFLOW
Two representative extensions to General Relativity:
- The background expansion is the same as LCDM
- One parameter to describe deviations from LCDM
How do these vary with cosmological parameters on small scales?
Described by four parameters
Code available on github soon!
Likelihood evaluations
But... How much information are we ignoring??
Credit: ChangHoon Hahn et al https://arxiv.org/abs/2012.02200
P
B
r
r1
r2
r3
Credit: Sihao Cheng et al https://arxiv.org/pdf/2006.08561.pdf
Input
x
Neural network
f
Representation
(Summary statistic)
r = f(x)
Output
o = g(r)
Increased interpretability through structured inputs
Modelling cross-correlations
ML and cosmology
- ML to accelerate non-linear predictions: allow MCMC sampling of non-linear scales
- Precision of future surveys: what and how we emulate will have a big impact on cosmological constraints
- Can ML extract **all** the information that there is at the field-level in the non-linear regime?
- Compare data and simulations, point us to the missing pieces?
deck
By carol cuesta
deck
- 334