Justine Zeghal
justine.zeghal@umontreal.caBayesian Deep Learning for Cosmology and Time Domain Astrophysics 3rd ed, Paris, May 22
The simplest model that best describes our observations is
Relying only on a few parameters:
Suggesting: ordinary matter, cold dark matter (CDM), and dark energy Λ as an explanation of the accelerated expansion.
Goal: determine the value of those parameters based on our observations.
Credit: ESA
For which we have an analytical likelihood function.
This likelihood function connects our compressed observations to the cosmological parameters.
Bayes theorem:
The traditional way of constraining cosmological parameters misses information.
This results in constraints on cosmological parameters that are not precise.
Credit: Natalia Porqueres
DES Y3 Results (with SBI).
Bayes theorem:
We can build a simulator to map the cosmological parameters to the data.
Prediction
Inference
Simulator
Depending on the simulator’s nature we can either perform
Simulator
Explicit joint likelihood
Initial conditions of the Universe
Large Scale Structure
Needs an explicit simulator to sample the joint posterior through MCMC:
We need to sample in extremely
high-dimension
→ gradient-based sampling schemes.
Depending on the simulator’s nature we can either perform
Simulator
It does not matter if the simulator is explicit or implicit because all we need are simulations
This approach typically involve 2 steps:
2) Implicit inference on these summary statistics to approximate the posterior.
1) compression of the high dimensional data into summary statistics. Without loosing cosmological information!
Summary statistics
Simulator
Which full-field inference methods require the fewest simulations?
How to build sufficient statistics?
Can we perform implicit inference with fewer simulations?
How to deal with model misspecification?
Which full-field inference methods require the fewest simulations?
How to build sufficient statistics?
Can we perform implicit inference with fewer simulations?
How to deal with model misspecification?
ICML 2022 Workshop on Machine Learning for Astrophysics
Justine Zeghal, François Lanusse, Alexandre Boucaud,
Benjamin Remy and Eric Aubourg
1) Draw N parameters
2) Draw N simulations
3) Train a neural density estimator on to approximate the quantity of interest
4) Approximate the posterior from the learned quantity
Change of Variable Formula:
Change of Variable Formula:
We need to learn the mapping
to approximate the complex distribution.
From simulations only!
A lot of simulations..
Truth
Approximation
With a few simulations it's hard to approximate the posterior distribution.
→ we need more simulations
BUT if we have a few simulations
and the gradients
(also know as the score)
then it's possible to have an idea of the shape of the distribution.
Normalizing flows are trained by minimizing the negative log likelihood:
But to train the NF, we want to use both simulations and the gradients from the simulator:
Normalizing flows are trained by minimizing the negative log likelihood:
But to train the NF, we want to use both simulations and the gradients from the simulator:
Normalizing flows are trained by minimizing the negative log likelihood:
But to train the NF, we want to use both simulations and the gradients from the simulator:
Normalizing flows are trained by minimizing the negative log likelihood:
Problem: the gradient of current NFs lack expressivity
But to train the NF, we want to use both simulations and the gradients from the simulator:
Normalizing flows are trained by minimizing the negative log likelihood:
Problem: the gradient of current NFs lack expressivity
But to train the NF, we want to use both simulations and the gradients from the simulator:
Normalizing flows are trained by minimizing the negative log likelihood:
Problem: the gradient of current NFs lack expressivity
But to train the NF, we want to use both simulations and the gradients from the simulator:
Normalizing flows are trained by minimizing the negative log likelihood:
Problem: the gradient of current NFs lack expressivity
But to train the NF, we want to use both simulations and the gradients from the simulator:
Normalizing flows are trained by minimizing the negative log likelihood:
Problem: the gradient of current NFs lack expressivity
But to train the NF, we want to use both simulations and the gradients from the simulator:
Normalizing flows are trained by minimizing the negative log likelihood:
A metric
We use the Classifier 2-Sample Tests (C2ST) metric.
distribution 1
distribution 2
Requirement: the true distributions is needed.
→ On a toy Lotka Volterra model, the gradients helps to constrain the distribution shape.
Without gradients
With gradients
Which full-field inference methods require the fewest simulations?
How to build sufficient statistics?
Can we perform implicit inference with fewer simulations?
How to deal with model misspecification?
Which full-field inference methods require the fewest simulations?
How to build sufficient statistics?
Can we perform implicit inference with fewer simulations?
How to deal with model misspecification?
Justine Zeghal, Denise Lanzieri, François Lanusse, Alexandre Boucaud, Gilles Louppe, Eric Aubourg, Adrian E. Bayer
and The LSST Dark Energy Science Collaboration (LSST DESC)
We developed a fast and differentiable (JAX) log-normal mass maps simulator.
Explicit inference theoretically and asymptotically converges to the truth.
Explicit inference and implicit inference yield comparable constraints.
C2ST = 0.6!
To use the C2ST we need the true posterior distribution.
→ We use the explicit full-field posterior.
Why?
(from the simulator)
→ For this particular problem, the gradients from the simulator are too noisy to help.
→ Implicit inference requires 1500 simulations.
→ In the case of perfect gradients it does not significantly help.
→ Simple distribution all the simulations seems to help locate the posterior distribution.
→ No, it does not help to reduce the number of simulations because the gradients of the simulator are too noisy.
→ Even with marginal gradients the gain is not significant.
→ For now, we now that implicit inference requires 1500 simulations.
What about explicit inference?
What about explicit inference?
→ Explicit inference requires
simulations.
Which full-field inference methods require the fewest simulations?
How to build sufficient statistics?
Can we perform implicit inference with fewer simulations?
How to deal with model misspecification?
Which full-field inference methods require the fewest simulations?
How to build sufficient statistics?
Can we perform implicit inference with fewer simulations?
How to deal with model misspecification?
Denise Lanzieri*, Justine Zeghal*, T. Lucas Makinen, François Lanusse, Alexandre Boucaud and Jean-Luc Starck
* equal contibutions
It is only a matter of the loss function used to train the compressor.
Definition: Sufficient Statistic
Text
Which learns a moment of the posterior distribution.
Mean Squared Error (MSE) loss:
Which learns a moment of the posterior distribution.
Mean Squared Error (MSE) loss:
→ Approximate the mean of the posterior.
Which learns a moment of the posterior distribution.
Mean Squared Error (MSE) loss:
→ Approximate the mean of the posterior.
Mean Absolute Error (MAE) loss:
Which learns a moment of the posterior distribution.
Mean Squared Error (MSE) loss:
→ Approximate the mean of the posterior.
Mean Absolute Error (MAE) loss:
→ Approximate the median of the posterior.
Which learns a moment of the posterior distribution.
The mean is not guaranteed to be a sufficient statistic.
By definition:
By definition:
By definition:
By definition:
By definition:
By definition:
→ should build sufficient statistics according to the definition.
By definition:
Log-normal LSST Y10 like
differentiable
simulator
1. We compress using one of the losses.
Benchmark procedure:
2. We compare their extraction power by comparing their posteriors.
For this, we use implicit inference, which is fixed for all the compression strategies.
Which full-field inference methods require the fewest simulations?
How to build sufficient statistics?
Can we perform implicit inference with fewer simulations?
How to deal with model misspecification?
Which full-field inference methods require the fewest simulations?
How to build sufficient statistics?
Can we perform implicit inference with fewer simulations?
How to deal with model misspecification?
Justine Zeghal, Benjamin Remy, Laurence Perreault-Levasseur, Yashar Hezaveh
Preliminary results*
What happens when the simulation model differs from the true physical model?
With full-field inference, we are now only relying on simulations, and we work at the pixel level.
We cannot escape this, as there may be physics that we do not understand or cannot model computationally.
A way to correct this bias is to learn a mapping to transform one
simulation into another
and we would like it to be the optimal transport mapping in the sense that is minimally transformed to match its PM counterpart.
OT Flow Matching enables to learn an OT mapping between two random distributions.
Need to learn discrete transformations
Need to learn a continuous transformation
Credit: https://mlg.eng.cam.ac.uk/blog/2024/01/20/flow-matching.html
Credit: Tong et al., 2023
Credit: Albergo et al., 2023
Flow Matching
Optimal Transport Flow Matching
Optimal Transport Flow Matching
Optimal Transport Flow Matching
Optimal Transport Flow Matching
Optimal Transport Flow Matching
Preliminary Results
Which full-field inference methods require the fewest simulations?
How to build sufficient statistics?
Can we perform implicit inference with fewer simulations?
How to deal with model misspecification?
Gradients can be beneficial, depending on your simulation model.
Explicit inference requires 100 times more simulations than implicit inference.
Mutual Information Maximization
We can learn an optimal transport mapping.
Simulator
Summary statistics