SCALABLE BAYESIAN INFERENCE WITH DIFFERENTIABLE SIMULATORS FOR THE
COSMOLOGICAL ANALYSIS OF THE DESI SPECTROSCOPIC SURVEY

Hugo SIMON-ONFROY,
PhD student supervised by Arnaud DE MATTIA and François LANUSSE

CSI, 2024/10/18

The universe recipe (so far)

$\frac{H}{H_0} = \sqrt{\Omega_r + \Omega_b + \Omega_c+ \Omega_\kappa + \Omega_\Lambda}$

instantaneous expansion rate

energy content

Cosmological principle + Einstein equation

+ Inflation

$\delta_L \sim \mathcal G(0, \mathcal P)$

$\sigma_8:= \sigma[\delta_L * \boldsymbol 1_{r \leq 8}]$

initial field

primordial power spectrum

std. of fluctuations smoothed at $8 \text{ Mpc/h}$

$\Omega := \{ \Omega_c, \Omega_b, \Omega_\Lambda, H_0, \sigma_8, n_s,...\}$

Linear matter spectrum

Structure growth

$\begin{align*}\operatorname{\boldsymbol{H}}(\Omega\mid \delta_g) &= \boldsymbol{H}(\delta_g \mid \Omega) + \boldsymbol{H}(\Omega) - \boldsymbol{H}(\delta_g)\\&= \boldsymbol{H}(\Omega) - \boldsymbol{I}(\Omega;\delta_g) \leq \boldsymbol{H}(\Omega)\end{align*}$

$\boldsymbol{I}(\Omega; \delta_g)$

$\boldsymbol{H}(X)$ = missing info on $X$

$\boldsymbol{H}(\Omega)$

$\boldsymbol{H}(\delta_g)$

$\boldsymbol{H}(\Omega\mid\delta_g)$

A high-dimensional inference problem

$\Omega := \{ \Omega_c, \Omega_b, \Omega_\Lambda, H_0, \sigma_8, n_s,...\}$

Linear matter spectrum

Structure growth

Cosmological model links cosmo $\Omega$ to initial field $\delta_L$ to galaxy density field $\delta_g$
Cosmological parameter inference obtained via marginalizing full posterior over initial field $\boldsymbol{p}(\Omega \mid \delta_g) = \int \boldsymbol{p}(\Omega, \delta_L \mid \delta_g) \;\mathrm d \delta_L$

A high-dimensional inference problem

How to perform this marginalization?

We gotta pump this information up

Field-level
CNN, GNN...
WST, 1D-PDFs, Holes...
Peak, Void, Split, Cluster...
3PCF, Bispectrum
2PCF, Power spectrum

$0-$

$\boldsymbol H(\delta_g)-$

At large scales, matter density field almost Gaussian so power spectrum is almost lossless compression, and is relatively tractable
To prospect smaller non-Gaussian scales, let's add:

Euclid’s view of Perseus

all the data
learn the stat
multiscale count
object correlations
more correlations
standard analysis

How much information can we still gain (not lose)?

field-level inference

summary stat inference

For full field marginalize then sample hardly tractable,
so sample then marginalize

sample jointly $\Omega$ and $\delta_L$ , then marginalize over $\delta_L$ samples
full history reconstruction without info loss
requires simulating LSS formation for each sample
for probing non-Gaussian scales of interest in DESI volume $\operatorname{dim}(\delta_L)\geq1024^3$

A high-dimensional sampling problem

Why care about differentiable model?

Variations around HMC

No U-Turn Sampler (NUTS)
- trajectory length auto-tuned
- samples drawn along trajectory
NUTSGibbs i.e. alternating sampling over parameter subsets

Model gradient drives sample particles towards high density regions
Hamiltonian Monte Carlo (HMC):
to travel farther, add inertia
Yields less correlated chains

3) Hamiltonian dynamic

1) mass $M$ particle at $q$

2) random kick $p$

2)random kick $p$

1) mass $M$ particle at $q$

3) Hamiltonian dynamic

Benchmarking

model setting: $64^3$ mesh, $(640\textrm{ Mpc/h})^3$ box, 1LPT, 2nd order Lagrangian bias expansion, RSD and Gaussian observational noise.
parameter space: initial field $\delta_L$ , cosmology $\Omega\! =\! \{\Omega_m, \sigma_8\}$ , and galaxy biases $b\!=\!\{b_1,b_2,b_{s^2},b_{\nabla^2}\}$ . Total of $64^3 + 2 +4$ parameters.
For NUTSGibbs: split sampling between $\delta_L$ and the rest (common in lit.)
Results suggest no particular advantage to splitting sampling between initial field and rest, cf. Simon-Onfroy et al. in prep

Reconstruct the initial field simultaneously, yielding posterior on full universe history

number of evaluations to yield one effective sample: the higher the worse

$\boxed{\min_s \operatorname{\boldsymbol{H}}(\Omega\mid s(\delta_g))} = \boldsymbol{H}(\Omega) - \max_s \boldsymbol{I}(\Omega ; s(\delta_g))$

$\boldsymbol{H}(\Omega)$

$\boldsymbol{H}(\delta_g)$

$\boldsymbol{H}(\mathcal s_1)$

$\boldsymbol{H}(\mathcal s_2)$

$\boldsymbol{H}(\mathcal P)$

non-Gaussianities

relevant stat
(low info but high mutual info)

irrelevant stat
(high info but low mutual info)

also a relevant stat
(high info and mutual info)

Which stats are relevant for cosmo inference?

Gaussianities

Compress or not compress

Effective Sample Size (ESS)
- number of i.i.d. samples that yield same statistical power.
- For sample sequence of size $N$ and autocorrelation $\rho$ $N_\textrm{eff} = \frac{N}{1+2 \sum_{t=1}^{+\infty}\rho_t}$ so aim for as less correlated sample as possible.
Main limiting computational factor is model evaluation (e.g. N-body), so characterize MCMC efficiency by $N_\text{eval} / N_\text{eff}$

Andy Jones

How to compare samplers?

1 . 1

SCALABLE BAYESIAN INFERENCE WITH DIFFERENTIABLE SIMULATORS FOR THE COSMOLOGICAL ANALYSIS OF THE DESI SPECTROSCOPIC SURVEY Hugo SIMON-ONFROY, PhD student supervised by Arnaud DE MATTIA and François LANUSSE CSI, 2024/10/18

	def model():
	z = sample('Z', dist.Normal(0, 1))
	x = sample('X', dist.Normal(z**2, 1))
	return x

	render_model(model, render_distributions=True)

	x_sample = dict(x=seed(model, 42)())
	obs_model = condition(model, x_sample)
	logp_fn = lambda z: log_density(obs_model,(),{},{'Z':z})[0]

	def model():
	z = sample('Z', dist.Normal(0, 1))
	x = sample('X', dist.Normal(z**2, 1))
	return x

	render_model(model, render_distributions=True)

	x_sample = dict(x=seed(model, 42)())
	obs_model = condition(model, x_sample)
	logp_fn = lambda z: log_density(obs_model,(),{},{'Z':z})[0]

	def model():
	z = sample('Z', dist.Normal(0, 1))
	x = sample('X', dist.Normal(z**2, 1))
	return x

	render_model(model, render_distributions=True)

	x_sample = dict(x=seed(model, 42)())
	obs_model = condition(model, x_sample)
	logp_fn = lambda z: log_density(obs_model,(),{},{'Z':z})[0]

	def model():
	z = sample('Z', dist.Normal(0, 1))
	x = sample('X', dist.Normal(z**2, 1))
	return x

	render_model(model, render_distributions=True)

	x_sample = dict(x=seed(model, 42)())
	obs_model = condition(model, x_sample)
	logp_fn = lambda z: log_density(obs_model,(),{},{'Z':z})[0]

	def model():
	z = sample('Z', dist.Normal(0, 1))
	x = sample('X', dist.Normal(z**2, 1))
	return x

	render_model(model, render_distributions=True)

	x_sample = dict(x=seed(model, 42)())
	obs_model = condition(model, x_sample)
	logp_fn = lambda z: log_density(obs_model,(),{},{'Z':z})[0]

SCALABLE BAYESIAN INFERENCE WITH DIFFERENTIABLE SIMULATORS FOR THE
COSMOLOGICAL ANALYSIS OF THE DESI SPECTROSCOPIC SURVEY

The universe recipe (so far)

A high-dimensional inference problem

A high-dimensional inference problem

Use summary statistics

We gotta pump this information up

More information is better,

but how much better?

A high-dimensional sampling problem

Some useful programming tools

Let's build a cosmological model

Why care about differentiable model?

Benchmarking

Activities 2023-2024

...and what's next

Recap...

Gaussianity and beyond

Compress or not compress

So JAX in practice?

So NumPyro in practice?

How to compare samplers?

CSI 2024

CSI 2024

hsimonfroy

	gradient = jax.grad(function)
	# too bad if you love chain ruling by hand

	vfunction = jax.vmap(function)
	pfunction = jax.pmap(function)
	# for-loops are for-loosers

	from jax import jit, vmap, grad
	score_vfn = jit(vmap(grad(logp_fn)))

	kernel = infer.NUTS(obs_model)
	mcmc = infer.MCMC(kernel, num_warmup, num_samples)
	mcmc.run(PRGNKey(43))
	samples = mcmc.get_samples()

	import jax.numpy as np
	# then enjoy

	function = jax.jit(function)
	# function is so fast now!

SCALABLE BAYESIAN INFERENCE WITH DIFFERENTIABLE SIMULATORS FOR THE COSMOLOGICAL ANALYSIS OF THE DESI SPECTROSCOPIC SURVEY

CSI 2024

More from hsimonfroy

SCALABLE BAYESIAN INFERENCE WITH DIFFERENTIABLE SIMULATORS FOR THE
COSMOLOGICAL ANALYSIS OF THE DESI SPECTROSCOPIC SURVEY