Copy of Tri-state Cosmology x machine learning journal club

Implicit full-field inference for LSST weak Lensing

Justine Zeghal

Supervisors: François Lanusse, Alexandre Boucaud, Eric Aubourg

Tri-state Cosmology x machine learning journal club

January 19, Paris, France

Full-field inference 2 ways..

Bayesian hierarchical modeling

Full-field inference 2 ways..

\theta

\theta

z

z

f

f

\sigma^2

\sigma^2

\mathcal{N}

\mathcal{N}

x

x

Full-field inference 2 ways..

Bayesian hierarchical modeling

\theta

\theta

z

z

f

f

\sigma^2

\sigma^2

\mathcal{N}

\mathcal{N}

Explicit joint likelihood

p(x| \theta, z)

p(x| \theta, z)

Full-field inference 2 ways..

Bayesian hierarchical modeling

x

x

z

z

And then run an MCMC to get the posterior:

\underbrace{p(\theta|x)}_{\text{posterior}}

\underbrace{p(\theta|x)}_{\text{posterior}}

\underbrace{p(\theta)}_{\text{prior}}

\underbrace{p(\theta)}_{\text{prior}}

\underbrace{p(x|\theta)}_{\text{likelihood}}

\underbrace{p(x|\theta)}_{\text{likelihood}}

\propto

\propto

And then run an MCMC to get the posterior:

Bayesian hierarchical modeling

\theta

\theta

f

f

\sigma^2

\sigma^2

\mathcal{N}

\mathcal{N}

Explicit joint likelihood

p(x| \theta, z)

p(x| \theta, z)

Full-field inference 2 ways..

x

x

It provides exact results but necessitates the BHM to be differentiable and requires a lot of simulations.

And then run an MCMC to get the posterior:

\underbrace{p(\theta|x)}_{\text{posterior}}

\underbrace{p(\theta|x)}_{\text{posterior}}

\underbrace{p(\theta)}_{\text{prior}}

\underbrace{p(\theta)}_{\text{prior}}

\underbrace{p(x|\theta)}_{\text{likelihood}}

\underbrace{p(x|\theta)}_{\text{likelihood}}

\propto

\propto

And then run an MCMC to get the posterior:

Bayesian hierarchical modeling

z

z

\theta

\theta

f

f

\sigma^2

\sigma^2

\mathcal{N}

\mathcal{N}

Explicit joint likelihood

p(x| \theta, z)

p(x| \theta, z)

Full-field inference 2 ways..

x

x

Implicit inference with sufficient statistics

Full-field inference 2 ways..

Implicit inference with sufficient statistics

z

z

\theta

\theta

f

f

\sigma^2

\sigma^2

\mathcal{N}

\mathcal{N}

Full-field inference 2 ways..

x

x

\theta

\theta

z

z

f

f

\sigma^2

\sigma^2

\mathcal{N}

\mathcal{N}

Implicit inference with sufficient statistics

Full-field inference 2 ways..

Simulator

x

x

\theta

\theta

Summary statistics

t = f_{\varphi}(x)

t = f_{\varphi}(x)

z

z

f

f

\sigma^2

\sigma^2

\mathcal{N}

\mathcal{N}

Implicit inference with sufficient statistics

Full-field inference 2 ways..

Simulator

x

x

\theta

\theta

z

z

f

f

\sigma^2

\sigma^2

\mathcal{N}

\mathcal{N}

(\theta_i, t_i)_{i=1...N}

(\theta_i, t_i)_{i=1...N}

Implicit inference with sufficient statistics

Full-field inference 2 ways..

Simulator

(\theta|

(\theta|

p_{\Phi}

p_{\Phi}

f_{\varphi}(x)

f_{\varphi}(x)

)

)

And use neural-based likelihood-free approaches to get the posterior

using only:

Summary statistics

t = f_{\varphi}(x)

t = f_{\varphi}(x)

x

x

\theta

\theta

z

z

f

f

\sigma^2

\sigma^2

\mathcal{N}

\mathcal{N}

And use neural-based likelihood-free approaches to get the posterior

using only:

(\theta_i, t_i)_{i=1...N}

(\theta_i, t_i)_{i=1...N}

Implicit inference with sufficient statistics

Full-field inference 2 ways..

Simulator

(\theta|

(\theta|

p_{\Phi}

p_{\Phi}

f_{\varphi}(x)

f_{\varphi}(x)

)

)

Summary statistics

t = f_{\varphi}(x)

t = f_{\varphi}(x)

x

x

\theta

\theta

z

z

f

f

\sigma^2

\sigma^2

\mathcal{N}

\mathcal{N}

And use neural-based likelihood-free approaches to get the posterior

using only:

(\theta_i, t_i)_{i=1...N}

(\theta_i, t_i)_{i=1...N}

Implicit inference with sufficient statistics

Full-field inference 2 ways..

Simulator

(\theta|

(\theta|

p_{\Phi}

p_{\Phi}

f_{\varphi}(x)

f_{\varphi}(x)

)

)

Summary statistics

t = f_{\varphi}(x)

t = f_{\varphi}(x)

x

x

\theta

\theta

z

z

f

f

\sigma^2

\sigma^2

\mathcal{N}

\mathcal{N}

And use neural-based likelihood-free approaches to get the posterior

using only:

(\theta_i, t_i)_{i=1...N}

(\theta_i, t_i)_{i=1...N}

Implicit inference with sufficient statistics

Full-field inference 2 ways..

Simulator

(\theta|

(\theta|

p_{\Phi}

p_{\Phi}

f_{\varphi}(x)

f_{\varphi}(x)

)

)

Summary statistics

t = f_{\varphi}(x)

t = f_{\varphi}(x)

x

x

\underbrace{p(\theta|x)}_{\text{posterior}}

\underbrace{p(\theta|x)}_{\text{posterior}}

\underbrace{p(\theta)}_{\text{prior}}

\underbrace{p(\theta)}_{\text{prior}}

\underbrace{p(x|\theta)}_{\text{likelihood}}

\underbrace{p(x|\theta)}_{\text{likelihood}}

\propto

\propto

Optimal Neural Summarisation for Full-Field Cosmological Implicit Inference

Denise Lanzieri, Justine Zeghal

T. Lucas Makinen, Alexandre Boucaud, François Lanusse, and Jean-Luc Starck

x

x

How to extract all the information?

It is only a matter of the loss function you use to train your compressor..

t = f_{\varphi}(x)

t = f_{\varphi}(x)

\text{A statistic } t \text{ is said to be sufficient for the parameters } \theta \text{ if }

\text{A statistic } t \text{ is said to be sufficient for the parameters } \theta \text{ if }

\text{Sufficient statistic}

\text{Sufficient statistic}

p(\theta \: | \: x) = p(\theta \: | \: t) \: \text{ with } \: t=f(x)

p(\theta \: | \: x) = p(\theta \: | \: t) \: \text{ with } \: t=f(x)

We developed a fast and differentiable (JAX) log-normal mass maps simulator

For our benchmark: a Differentiable Mass Maps Simulator

sbi_lens

Numerical results

1. We compress using one of the 5 losses.

Benchmark procedure:

2. We compare their extraction power by comparing their posteriors.

For this, we use a neural-based likelihood-free approach, which is fixed for all the compression strategies.

p(\theta \: | \: x) = p(\theta \: | \: t) \: \text{ with } \: t=f(x)

p(\theta \: | \: x) = p(\theta \: | \: t) \: \text{ with } \: t=f(x)

Simulation-Efficient Implicit Inference.

Is differentiability useful?

Justine Zeghal

Denise Lanzieri, Alexandre Boucaud, François Lanusse, and Eric Aubourg

do gradients help implicit inference methods?

With a few simulations it's hard to approximate the posterior distribution.

→ we need more simulations

BUT if we have a few simulations

and the gradients

(also know as the score)

\nabla_{\theta} \log p(\theta | x)

\nabla_{\theta} \log p(\theta | x)

then it's possible to have an idea of the shape of the distribution.

do gradients help implicit inference methods?

Normalizing flows are trained by minimizing the negative log likelihood:

do gradients help implicit inference methods?

- \mathbb{E}_{p(x)}\left[ \log\left(p^{\phi}(\theta | x)\right) \right]

- \mathbb{E}_{p(x)}\left[ \log\left(p^{\phi}(\theta | x)\right) \right]

- \mathbb{E}_{p(x)}\left[ \log\left(p^{\phi}(\theta | x)\right) \right]

- \mathbb{E}_{p(x)}\left[ \log\left(p^{\phi}(\theta | x)\right) \right]

Normalizing flows are trained by minimizing the negative log likelihood:

do gradients help implicit inference methods?

But to train the NF, we want to use both simulations and gradients

- \mathbb{E}_{p(x)}\left[ \log\left(p^{\phi}(\theta | x)\right) \right]

- \mathbb{E}_{p(x)}\left[ \log\left(p^{\phi}(\theta | x)\right) \right]

- \mathbb{E}_{p(x)}\left[ \log\left(p^{\phi}(\theta | x)\right) \right]

- \mathbb{E}_{p(x)}\left[ \log\left(p^{\phi}(\theta | x)\right) \right]

Normalizing flows are trained by minimizing the negative log likelihood:

do gradients help implicit inference methods?

But to train the NF, we want to use both simulations and gradients

+ \: \lambda \: \displaystyle \mathbb{E}\left[ \parallel \nabla_{\theta} \log p(\theta, z |x) - \underbrace{\nabla_{\theta} \log p^{\phi}(\theta |x)}\parallel_2^2 \right]

+ \: \lambda \: \displaystyle \mathbb{E}\left[ \parallel \nabla_{\theta} \log p(\theta, z |x) - \underbrace{\nabla_{\theta} \log p^{\phi}(\theta |x)}\parallel_2^2 \right]

- \mathbb{E}_{p(x)}\left[ \log\left(p^{\phi}(\theta | x)\right) \right]

- \mathbb{E}_{p(x)}\left[ \log\left(p^{\phi}(\theta | x)\right) \right]

- \mathbb{E}_{p(x)}\left[ \log\left(p^{\phi}(\theta | x)\right) \right]

- \mathbb{E}_{p(x)}\left[ \log\left(p^{\phi}(\theta | x)\right) \right]

Normalizing flows are trained by minimizing the negative log likelihood:

do gradients help implicit inference methods?

But to train the NF, we want to use both simulations and gradients

+ \: \lambda \: \displaystyle \mathbb{E}\left[ \parallel \nabla_{\theta} \log p(\theta, z |x) - \underbrace{\nabla_{\theta} \log p^{\phi}(\theta |x)}\parallel_2^2 \right]

+ \: \lambda \: \displaystyle \mathbb{E}\left[ \parallel \nabla_{\theta} \log p(\theta, z |x) - \underbrace{\nabla_{\theta} \log p^{\phi}(\theta |x)}\parallel_2^2 \right]

- \mathbb{E}_{p(x)}\left[ \log\left(p^{\phi}(\theta | x)\right) \right]

- \mathbb{E}_{p(x)}\left[ \log\left(p^{\phi}(\theta | x)\right) \right]

- \mathbb{E}_{p(x)}\left[ \log\left(p^{\phi}(\theta | x)\right) \right]

- \mathbb{E}_{p(x)}\left[ \log\left(p^{\phi}(\theta | x)\right) \right]

Normalizing flows are trained by minimizing the negative log likelihood:

do gradients help implicit inference methods?

Problem: the gradient of current NFs lack expressivity

But to train the NF, we want to use both simulations and gradients

+ \: \lambda \: \displaystyle \mathbb{E}\left[ \parallel \nabla_{\theta} \log p(\theta, z |x) - \underbrace{\nabla_{\theta} \log p^{\phi}(\theta |x)}\parallel_2^2 \right]

+ \: \lambda \: \displaystyle \mathbb{E}\left[ \parallel \nabla_{\theta} \log p(\theta, z |x) - \underbrace{\nabla_{\theta} \log p^{\phi}(\theta |x)}\parallel_2^2 \right]

- \mathbb{E}_{p(x)}\left[ \log\left(p^{\phi}(\theta | x)\right) \right]

- \mathbb{E}_{p(x)}\left[ \log\left(p^{\phi}(\theta | x)\right) \right]

- \mathbb{E}_{p(x)}\left[ \log\left(p^{\phi}(\theta | x)\right) \right]

- \mathbb{E}_{p(x)}\left[ \log\left(p^{\phi}(\theta | x)\right) \right]

Normalizing flows are trained by minimizing the negative log likelihood:

do gradients help implicit inference methods?

Problem: the gradient of current NFs lack expressivity

But to train the NF, we want to use both simulations and gradients

+ \: \lambda \: \displaystyle \mathbb{E}\left[ \parallel \nabla_{\theta} \log p(\theta,z |x) - \underbrace{\nabla_{\theta} \log p^{\phi}(\theta |x)}\parallel_2^2 \right]

+ \: \lambda \: \displaystyle \mathbb{E}\left[ \parallel \nabla_{\theta} \log p(\theta,z |x) - \underbrace{\nabla_{\theta} \log p^{\phi}(\theta |x)}\parallel_2^2 \right]

- \mathbb{E}_{p(x)}\left[ \log\left(p^{\phi}(\theta | x)\right) \right]

- \mathbb{E}_{p(x)}\left[ \log\left(p^{\phi}(\theta | x)\right) \right]

- \mathbb{E}_{p(x)}\left[ \log\left(p^{\phi}(\theta | x)\right) \right]

- \mathbb{E}_{p(x)}\left[ \log\left(p^{\phi}(\theta | x)\right) \right]

Normalizing flows are trained by minimizing the negative log likelihood:

do gradients help implicit inference methods?

Problem: the gradient of current NFs lack expressivity

But to train the NF, we want to use both simulations and gradients

- \mathbb{E}_{p(x)}\left[ \log\left(p^{\phi}(\theta | x)\right) \right]

- \mathbb{E}_{p(x)}\left[ \log\left(p^{\phi}(\theta | x)\right) \right]

- \mathbb{E}_{p(x)}\left[ \log\left(p^{\phi}(\theta | x)\right) \right]

- \mathbb{E}_{p(x)}\left[ \log\left(p^{\phi}(\theta | x)\right) \right]

+ \: \lambda \: \displaystyle \mathbb{E}\left[ \parallel \nabla_{\theta} \log p(\theta,z |x) - \underbrace{\nabla_{\theta} \log p^{\phi}(\theta |x)}\parallel_2^2 \right]

+ \: \lambda \: \displaystyle \mathbb{E}\left[ \parallel \nabla_{\theta} \log p(\theta,z |x) - \underbrace{\nabla_{\theta} \log p^{\phi}(\theta |x)}\parallel_2^2 \right]

Normalizing flows are trained by minimizing the negative log likelihood:

do gradients help implicit inference methods?

Problem: the gradient of current NFs lack expressivity

But to train the NF, we want to use both simulations and gradients

- \mathbb{E}_{p(x)}\left[ \log\left(p^{\phi}(\theta | x)\right) \right]

- \mathbb{E}_{p(x)}\left[ \log\left(p^{\phi}(\theta | x)\right) \right]

- \mathbb{E}_{p(x)}\left[ \log\left(p^{\phi}(\theta | x)\right) \right]

- \mathbb{E}_{p(x)}\left[ \log\left(p^{\phi}(\theta | x)\right) \right]

+ \: \lambda \: \displaystyle \mathbb{E}\left[ \parallel \nabla_{\theta} \log p(\theta,z |x) - \underbrace{\nabla_{\theta} \log p^{\phi}(\theta |x)}\parallel_2^2 \right]

+ \: \lambda \: \displaystyle \mathbb{E}\left[ \parallel \nabla_{\theta} \log p(\theta,z |x) - \underbrace{\nabla_{\theta} \log p^{\phi}(\theta |x)}\parallel_2^2 \right]

Normalizing flows are trained by minimizing the negative log likelihood:

do gradients help implicit inference methods?

Problem: the gradient of current NFs lack expressivity

But to train the NF, we want to use both simulations and gradients

- \mathbb{E}_{p(x)}\left[ \log\left(p^{\phi}(\theta | x)\right) \right]

- \mathbb{E}_{p(x)}\left[ \log\left(p^{\phi}(\theta | x)\right) \right]

- \mathbb{E}_{p(x)}\left[ \log\left(p^{\phi}(\theta | x)\right) \right]

- \mathbb{E}_{p(x)}\left[ \log\left(p^{\phi}(\theta | x)\right) \right]

+ \: \lambda \: \displaystyle \mathbb{E}\left[ \parallel \nabla_{\theta} \log p(\theta,z |x) - \underbrace{\nabla_{\theta} \log p^{\phi}(\theta |x)}\parallel_2^2 \right]

+ \: \lambda \: \displaystyle \mathbb{E}\left[ \parallel \nabla_{\theta} \log p(\theta,z |x) - \underbrace{\nabla_{\theta} \log p^{\phi}(\theta |x)}\parallel_2^2 \right]