Simulation-based inference
= Likelihood-free inference, implicit inference
Why?
- Likelihoods are rarely gaussian, specially true for non-two point functions
- Large parameter spaces -> MCMC hugely expensive
Density estimation
How?
\theta
Simulated Data
Data
Prior
Posterior
x_\mathrm{obs}
x
P(\theta|x_\mathrm{obs})
Forwards
Inverse
Inference = Optimisation
\mathcal{L}(\mathcal{D})
\mathcal{L}(\mathcal{D}|\theta)
\mathcal{P}(\theta|\mathcal{D})
MCMC or another density estimator
Can also estimate posterior directly!
Density estimator
x = \green{f_\phi}(z)
\mathcal{L}(\mathcal{D}) = - \frac{1}{\vert\mathcal{D}\vert}\sum_{\mathbf{x} \in \mathcal{D}} \log p(\mathbf{x})
Maximize the data likelihood
NeuralNet
Normalising flows
p(\mathbf{x}) = p_z(\mathbf{z}) \left\vert \det \dfrac{d \mathbf{z}}{d \mathbf{x}} \right\vert
= p_z(f^{-1}(\mathbf{x})) \left\vert \det \dfrac{d f^{-1}}{d \mathbf{x}} \right\vert
f must be invertible
J efficient to compute
1-D
n-D
p(\mathbf{x}) = p_z(f^{-1}(\mathbf{x})) \left\vert \det J(f^{-1}) \right\vert
x = f(z), \, z = f^{-1}(x)
p(\mathbf{x}) = p_z(\mathbf{z}) \left\vert \det \dfrac{d \mathbf{z}}{d \mathbf{x}} \right\vert
p(\mathbf{x}|\theta) = p_z(f^{-1}(\mathbf{x}|\theta)) \left\vert \det J(f^{-1}|\theta) \right\vert
Sequentially improve
Sample from
p(\mathcal{G}|\mathcal{C}, x_\mathrm{obs})
Refine accuracy on HOD parameters close to the data
Issue -> we estimate
p(\mathcal{G},\mathcal{C}| x_\mathrm{obs})
Targeting the posterior or the likelihood?
Pros posterior
- Fast sampling (no need MCMC)
Middle ground: Likelihood + Variational Inference
Pros likelihood
- combine multiple observations
- use without retraining if the prior is changed
- easier to adapt to sequential models
Implicit Likelihood vs Mean emulators
- No likelihood assumption -> Requires simulations with different seeds
- Assume Gaussian likelihood and estimate the covariance matrix from a different set of simulations
- Learn mean relation -> fixed seed simulations useful
- Learn likelihood or posterior directly
S(X|\theta)
P(\theta|S(X))
Hybrid for Abacus Summit
\{\mathcal{C}, \mathcal{G} \}
f_\phi
S
\mathcal{L}(S|\mathcal{C}, \mathcal{D})
Gaussian or estimated from fixed cosmology
Density estimator / Variational Inference
p(\mathcal{C}, \mathcal{D}|S)
f^\prime_\phi
Loss
Loss
Loss (MSE)
Simulation-based inference
By carol cuesta
Simulation-based inference
- 382