Simulation-based inference

= Likelihood-free inference, implicit inference

Why?

Likelihoods are rarely gaussian, specially true for non-two point functions

Large parameter spaces -> MCMC hugely expensive

Density estimation

How?

\theta

Simulated Data

Data

Prior

Posterior

x_\mathrm{obs}

P(\theta|x_\mathrm{obs})

Forwards

Inverse

Inference = Optimisation

\mathcal{L}(\mathcal{D})

\mathcal{L}(\mathcal{D}|\theta)

\mathcal{P}(\theta|\mathcal{D})

MCMC or another density estimator

Can also estimate posterior directly!

Density estimator

x = \green{f_\phi}(z)

Image Credit: https://lilianweng.github.io/posts/2018-10-13-flow-models/

\mathcal{L}(\mathcal{D}) = - \frac{1}{\vert\mathcal{D}\vert}\sum_{\mathbf{x} \in \mathcal{D}} \log p(\mathbf{x})

Maximize the data likelihood

NeuralNet

Normalising flows

p(\mathbf{x}) = p_z(\mathbf{z}) \left\vert \det \dfrac{d \mathbf{z}}{d \mathbf{x}} \right\vert = p_z(f^{-1}(\mathbf{x})) \left\vert \det \dfrac{d f^{-1}}{d \mathbf{x}} \right\vert

f must be invertible

J efficient to compute

1-D

n-D

p(\mathbf{x}) = p_z(f^{-1}(\mathbf{x})) \left\vert \det J(f^{-1}) \right\vert

x = f(z), \, z = f^{-1}(x)

p(\mathbf{x}) = p_z(\mathbf{z}) \left\vert \det \dfrac{d \mathbf{z}}{d \mathbf{x}} \right\vert

p(\mathbf{x}|\theta) = p_z(f^{-1}(\mathbf{x}|\theta)) \left\vert \det J(f^{-1}|\theta) \right\vert

Sequentially improve

Sample from

p(\mathcal{G}|\mathcal{C}, x_\mathrm{obs})

Refine accuracy on HOD parameters close to the data

Issue -> we estimate

p(\mathcal{G},\mathcal{C}| x_\mathrm{obs})

Targeting the posterior or the likelihood?

Pros posterior

Fast sampling (no need MCMC)

Middle ground: Likelihood + Variational Inference

Pros likelihood

combine multiple observations
use without retraining if the prior is changed
easier to adapt to sequential models

Implicit Likelihood vs Mean emulators

No likelihood assumption -> Requires simulations with different seeds

Assume Gaussian likelihood and estimate the covariance matrix from a different set of simulations

Learn mean relation -> fixed seed simulations useful

Learn likelihood or posterior directly

S(X|\theta)

P(\theta|S(X))

Hybrid for Abacus Summit

\{\mathcal{C}, \mathcal{G} \}

f_\phi

\mathcal{L}(S|\mathcal{C}, \mathcal{D})

Gaussian or estimated from fixed cosmology

Density estimator / Variational Inference

p(\mathcal{C}, \mathcal{D}|S)

f^\prime_\phi

Loss

Loss (MSE)

Simulation-based inference

By carol cuesta

Simulation-based inference

Simulation-based inference

Simulation-based inference

More from carol cuesta