Seminar at IP2I, Lyon, France
May 24, 2024
Justine Zeghal
credit: Villasenor et al. 2023
Bayes theorem:
We want to infer the parameters that generated an observation
And run a MCMC to get the posterior
Bayes theorem:
We want to infer the parameters that generated an observation
Bayes theorem:
Problem:
we do not have an analytic marginal likelihood that maps the cosmological parameters to what we observe
We want to infer the parameters that generated an observation
Bayes theorem:
Problem:
we do not have an analytic marginal likelihood that maps the cosmological parameters to what we observe
We want to infer the parameters that generated an observation
Credit: ESA
Bayes theorem:
Classical way of performing Bayesian Inference in Cosmology:
We want to infer the parameters that generated an observation
Credit: ESA
Bayes theorem:
We want to infer the parameters that generated an observation
Classical way of performing Bayesian Inference in Cosmology:
Power Spectrum
Credit: arxiv.org/abs/1807.06205
Bayes theorem:
We want to infer the parameters that generated an observation
Classical way of performing Bayesian Inference in Cosmology:
Power Spectrum
& Gaussian Likelihood
On large scales, the Universe is close to a Gaussian field and the 2-point function is a near sufficient statistic.
However, on small scales where non-linear evolution gives rise to a highly non-Gaussian field, this summary statistic is not sufficient anymore.
Proof: full-field inference yield tighter constrain.
Bayes theorem:
Bayes theorem:
We can build a simulator to map the cosmological parameters to the data.
Bayes theorem:
We can build a simulator to map the cosmological parameters to the data.
Simulator
Bayes theorem:
We can build a simulator to map the cosmological parameters to the data.
Simulator
Bayes theorem:
We can build a simulator to map the cosmological parameters to the data.
Simulator
Bayes theorem:
We can build a simulator to map the cosmological parameters to the data.
Prediction
Simulator
Bayes theorem:
We can build a simulator to map the cosmological parameters to the data.
Prediction
Simulator
Inference
But we still lack the explicit marginal likelihood
Simulator
But we still lack the explicit marginal likelihood
But we still lack the explicit marginal likelihood
Explicit simulator
Explicit joint likelihood
But we still lack the explicit marginal likelihood
Explicit simulator
Explicit joint likelihood
But we still lack the explicit marginal likelihood
Explicit simulator
Intractable!
Explicit joint likelihood
But we still lack the explicit marginal likelihood
Explicit simulator
Two options:
Explicit joint likelihood
Intractable!
Intractable!
But we still lack the explicit marginal likelihood
Explicit simulator
Two options:
Explicit joint likelihood
Intractable!
But we still lack the explicit marginal likelihood
Explicit simulator
Two options:
Explicit joint likelihood
Intractable!
But we still lack the explicit marginal likelihood
Black box simulator
Only one option:
Simulator
Intractable!
Explicit simulator
Explicit joint likelihood
Explicit simulator
Sampled the joint posterior through MCMC:
Explicit joint likelihood
Explicit simulator
Sampled the joint posterior through MCMC:
Explicit joint likelihood
Explicit simulator
Sampled the joint posterior through MCMC:
Explicit joint likelihood
Drawbacks:
Explicit simulator
Explicit joint likelihood
Explicit simulator
Black box simulator
Simulator
Or
Explicit joint likelihood
Explicit simulator
Black box simulator
Simulator
Or
Explicit joint likelihood
From a set of simulations we can approximate the
thanks to machine learning ..
The algorithm is the same for each method:
1) Draw N parameters
2) Draw N simulations
3) Train a neural network on to approximate the quantity of interest
4) Approximate the posterior from the learned quantity
The algorithm is the same for each method:
1) Draw N parameters
2) Draw N simulations
3) Train a neural network on to approximate the quantity of interest
4) Approximate the posterior from the learned quantity
The algorithm is the same for each method:
1) Draw N parameters
2) Draw N simulations
3) Train a neural network on to approximate the quantity of interest
4) Approximate the posterior from the learned quantity
We will focus on the Neural Likelihood Estimation and Neural Posterior Estimation methods
We need a model that can approximate distributions from its samples.
Easy to evaluate
We need a model that can approximate distributions from its samples.
Easy to evaluate
and sample
We need a model that can approximate distributions from its samples.
Easy to evaluate
and sample
reference: https://blog.evjang.com/2019/07/nf-jax.html
The complex distribution is linked to the simple one through the
Change of Variable Formula:
The complex distribution is linked to the simple one through the
Change of Variable Formula:
Variational parameters related to the mapping
From simulations of the true distribution only!
This is super nice, it allows us to approximate the posterior distribution from simulations ONLY!
But simulations can sometimes be very expensive and training a NF requires a lot of simulations..
ICML 2022 Workshop on Machine Learning for Astrophysics
Justine Zeghal, François Lanusse, Alexandre Boucaud,
Benjamin Remy and Eric Aubourg
Explicit joint likelihood
Explicit joint likelihood
Explicit joint likelihood
Explicit joint likelihood
a framework for automatic differentiation following the NumPy API, and using GPU
probabilistic programming library
powered by JAX
Explicit joint likelihood
With a few simulations it's hard to approximate the posterior distribution.
→ we need more simulations
BUT if we have a few simulations
and the gradients
(also know as the score)
then it's possible to have an idea of the shape of the distribution.
Normalizing flows are trained by minimizing the negative log likelihood:
But to train the NF, we want to use both simulations and gradient
Normalizing flows are trained by minimizing the negative log likelihood:
But to train the NF, we want to use both simulations and gradient
Normalizing flows are trained by minimizing the negative log likelihood:
But to train the NF, we want to use both simulations and gradient
Normalizing flows are trained by minimizing the negative log likelihood:
Problem: the gradient of current NFs lack expressivity
But to train the NF, we want to use both simulations and gradient
Normalizing flows are trained by minimizing the negative log likelihood:
Problem: the gradient of current NFs lack expressivity
But to train the NF, we want to use both simulations and gradient
Normalizing flows are trained by minimizing the negative log likelihood:
Problem: the gradient of current NFs lack expressivity
But to train the NF, we want to use both simulations and gradient
Normalizing flows are trained by minimizing the negative log likelihood:
Problem: the gradient of current NFs lack expressivity
But to train the NF, we want to use both simulations and gradient
Normalizing flows are trained by minimizing the negative log likelihood:
Problem: the gradient of current NFs lack expressivity
But to train the NF, we want to use both simulations and gradient
Normalizing flows are trained by minimizing the negative log likelihood:
→ On a toy Lotka Volterra model, the gradients helps to constrain the distribution shape
Justine Zeghal, Denise Lanzieri, François Lanusse, Alexandre Boucaud, Gilles Louppe, Eric Aubourg, and
The LSST Dark Energy Science Collaboration (LSST DESC)
We developed a fast and differentiable (JAX) log-normal mass maps simulator
Explicit joint likelihood
(from the simulator)
(requires a lot of additional simulations)
→ For this particular problem, the gradients from the simulator are too noisy to help.
→ Implicit inference (NLE) requires 1500 simulations.
→ Better to use NLE without gradients than NLE with gradients
simulations
simulations
→ Explicit and implicit full-field inference yields the same posterior.
→ Explicit full-field inference requires 630 000 simulations (HMC in high dimension)
→ Implicit full-field inference requires 1 500 simulations
+ a maximum of 100 000 simulations to build
sufficient statistics
Explicit joint likelihood
simulations
simulations
→ Explicit and implicit full-field inference yields the same posterior.
→ Explicit full-field inference requires 630 000 simulations (HMC in high dimension)
→ Implicit full-field inference requires 1 500 simulations
+ a maximum of 100 000 simulations to build
sufficient statistics
Denise Lanzieri, Justine Zeghal, T. Lucas Makinen, François Lanusse, Alexandre Boucaud and Jean-Luc Starck
Summary statistics
Simulator
Summary statistics
Simulator
Summary statistics
Simulator
It is only a matter of the loss function you use to train your compressor..
Regression
Mutual information maximization
Regression
Mutual information maximization
Regression
Log-normal LSST Y10 like
differentiable
simulator
1. We compress using one of the 4 losses.
Benchmark procedure:
2. We compare their extraction power by comparing their posteriors.
For this, we use a neural-based likelihood-free approach, which is fixed for all the compression strategies.
Explicit likelihood
Explicit likelihood
Simulator
Implicit likelihood
Explicit likelihood
Implicit likelihood
Simulator
Explicit inference
or
Implicit inference
Explicit likelihood
Implicit likelihood
Implicit inference
Explicit inference
or
Implicit inference
Implicit inference
Explicit inference
Implicit inference
Explicit inference
Implicit inference
Explicit inference
Implicit inference
Explicit inference
Simulator
Summary statistics