Redshift surveys in a nutshell

Learning summary statistics with machine learning

Carolina Cuesta-Lazaro

13th December 2021 - MPA

Collaborators: Cheng-Zong Ruan, Yosuke Kobayashi, Alexander Eggemeier, Pauline Zarrouk, Sownak Bose, Takahiro Nishimichi, Baojiu Li, Carlton Baugh

(\vec{\theta}_i, z_i)
z_i = z_{\mathrm{Cosmological} }
+ z_{\mathrm{Doppler}}
\chi(z) = \int_0^z \frac{dz'}{H(z')}
+ \frac{v_{\mathrm{pec}}}{aH(a)}
\chi_i

Fifth forces modify structure growth

GROWTH 

- GRAVITY

- FIFTH FORCE

+ EXPANSION

Credit: Cartoon depicting Willem de Sitter as Lambda from Algemeen Handelsblad  (1930).

Cosmology =

\{\vec{c}\}

Main Assumptions

1) Galaxies don't impact dark matter clustering

2) Number of galaxies depends on halo mass only -> Assembly bias?

1) We don't know the Initial Conditions

2) Data is very high dimensional

3) Impact of unknowns (baryonic physics)

4) N-body sims extremely slow to run!

Cosmology =

Galaxy =

\{\vec{c}\}
\{\vec{g}\}
\Omega_M

Summarise the data

1. Modelling Redshift Space Distortions

The Streaming Model

1+\xi^S(s_\perp, s_\parallel) = \int dr_\parallel \left(1 + \xi^R(r)\right) \red{\mathcal{P}(v_\parallel=s_\parallel-r_\parallel|r_\perp, r_\parallel)}

PAIRWISE VELOCITY

DISTRIBUTION

r
v_{\parallel,1}
v_{\parallel,2}
v_{\parallel} = v_{\parallel,1} - v_{\parallel,2}
s
s_{\parallel} = v_{\parallel} + r_{\parallel}
\xi(r)
\xi(s_\perp, s_\parallel)

Probability of finding a pair of galaxies at distance r

v_\parallel < 0

INFALL

v_\parallel > 0

OUTFLOW

1+\xi^S(s_\perp, s_\parallel) = \int dr_\parallel \left(1 + \xi^R(r)\right) \red{\mathcal{P}(v_\parallel=s_\parallel-r_\parallel|r_\perp, r_\parallel)}
\xi^S(s_\perp, s_\parallel) \approx \xi^R(s) + \sum_n \frac{(-1)^n}{n!} \frac{d^n}{d s_\parallel^n} \left( (1+ \xi^R(s)) m_n(s) \right)

On large scales,

slowly varying function of 

r_\parallel
v_\parallel < 0

INFALL

v_\parallel > 0

OUTFLOW

Two representative MG models f(R) and nDGP:

- The background expansion is the same as LCDM

- One parameter to describe deviations from LCDM

 (same large scale real space clustering)

1+\xi^S(s_\perp, s_\parallel) = \int dr_\parallel \left(1 + \blue{\xi^R(r)}\right) \red{\mathcal{P}(v_\parallel=s_\parallel-r_\parallel|r_\perp, r_\parallel)}

How do these vary with cosmological parameters on small scales?

Described by four parameters

2. Simulation-based models 

Cosmology =

\{\vec{c}\}
\vec{c}_i

Neural Network Emulator

\vec{c}, \mathrm{redshift}, M_h
\xi^R_{hh}(r|M_h)
v_{hh}(r|M_h)

1) Very fast -> MCMC

2) Halo-Galaxy mapping modelled very accurately

3) Allows for flexible implementations of Halo-Galaxy connection

4) Modelling RSD through the Streaming Model simplifies the functions the emulator needs to learn

\xi^R_{hh}(r|c_i, M_h)
v_{hh}(r|c_i, M_h)

Galaxy =

\{\vec{g}\}
\xi_{gg} \propto \int d M_h W(\vec{g}_j, M_h) \xi_{hh}(M_h)
\xi_{gg} \propto \int d M_h W(\vec{g}_j, M_h) \xi_{hh}(M_h)

WORK IN PROGRESS

3. Complementary summary statistics

Simplify the model by separating different environments

Voids

Clusters

r [h^{-1} \mathrm{Mpc}]

Assumed density splits identified in real space

How much information is still missing??

\Omega_M
\Omega_\Lambda
\sigma_8

Input

x

 

Neural network

f

Representation

(Summary statistic)

r = f(x)

Output

o = g(r)

Invariance to known unknowns

 

Increased interpretability through structured inputs

Modelling cross-correlations

Conclusions

  • Redshift Space Distortions allow us to constrain gravity models
  • We need to account for the non-Gaussian real to redshift space mapping to correctly model deviations from GR

 

  • Can we learn the optimal summary statistic through Machine Learning?
  • Current constrains can be improved by extending models to smaller pair separations through N-body simulations

 

  • How limited are we by our Halo-Galaxy connection assumptions?

MPA

By carol cuesta

MPA

  • 346