Impact of Weak Lensing Mass Mapping Algorithms on Cosmology Inference

June 11th, 2025


Andreas Tersenov

COLOURS workshop, Saclay










if you are seeing this in pdf, a nicer version of the slides is available at andreastersenov.github.io/talks/COLOURS_Saclay_2025/

Le problème en question

  • Weak lensing inference with higher-order statistics (HOS) often relies on mass maps.
  • Producing these maps is an ill-posed inverse problem — there is no unique solution.
  • Current surveys use simplistic algorithms because they are computationally efficient.

This raises an important question:

Does the choice of mass mapping algorithm affect the inferred cosmological parameters?
Or does it not matter, as long as the same operator is applied to both data and simulations?

How to test this properly?


We need to create a pipeline that can:

$\,$Introduction - Weak Lensing$\,$

  • WL = Observational technique in cosmology for studying the matter distribution in the universe

  • Principle: deflection of light from distant galaxies by gravitational fields → causes image distortion

  • Weak → subtle & coherent distortions of background galaxy shapes

  • WL provides a direct measurement of the gravitational distortion.
  • WL enables us to probe the cosmic structure, investigate the nature of dark matter, and constrain cosmological parameters.

Shear & Convergence

Weak Lensing Distortions
Convergence $\kappa$
$$ \kappa = \frac{1}{2} (\partial_1 \partial_1 + \partial_2 \partial_2) \psi = \frac12 \nabla^2 \psi $$

difficult to measure

Shear $\gamma$
$$ \gamma_1 = \frac{1}{2} (\partial_1 \partial_1 - \partial_2 \partial_2) \psi, \,\, \gamma_2 = \partial_1 \partial_2 \psi $$

⟶ can be measured by statistical analysis of galaxy shapes

Shear estimation

Galaxy shapes as estimators for gravitational shear
$$ e = \gamma + e_i \qquad \mbox{ with } \qquad e_i \sim \mathcal{N}(0, I)$$
  • We are trying the measure the ellipticity $e$ of galaxies as an estimator for the gravitational shear $\gamma$

Relation between $\kappa$ and $\gamma$

Relation between $\kappa$ and $\gamma$
  • From convergence to shear: $\,\, \gamma_i = \hat{P}_i \kappa$
  • From shear to convergence: $\,\, \kappa = \hat{P}_1 \gamma_1 + \hat{P}_2 \gamma_2$

$$ \hat{P}_1(k)=\dfrac{k_1^2 - k_2^2}{k^2}, \,\, \hat{P}_2(k)=\dfrac{2k_1k_2}{k^2} $$

The Weak Lensing Mass-Mapping as an Inverse Problem

Shear $\gamma$
Convergence $\kappa$
$$\gamma_1 = \frac{1}{2} (\partial_1^2 - \partial_2^2) \ \Psi \quad;\quad \gamma_2 = \partial_1 \partial_2 \ \Psi \quad;\quad \kappa = \frac{1}{2} (\partial_1^2 + \partial_2^2) \ \Psi$$
$$\boxed{\gamma = \mathbf{P} \kappa}$$

Kaiser-Squires inversion

  • Mass mapping problem recover an estimate of the convergence from measurements of the shear $$\kappa = \dfrac12 \Delta \psi, \,\, \gamma_1 = \dfrac12 \left( \partial_1^2 \psi - \partial_2^2 \psi \right), \,\, \gamma_2 = \partial_1 \partial_2 \psi $$ $$\Rightarrow \kappa = \Delta^{-1} \left[ \left( \partial_1^2 - \partial_2^2 \right) \gamma_1 + 2\partial_1 \partial_2 \gamma_2 \right] $$
Kaiser-Squires estimator:
$$ \tilde{\kappa}_E+i \tilde{\kappa}_B=\left(\frac{k_1^2-k_2^2}{k^2}+i \frac{2 k_1 k_2}{k^2}\right)\left(\tilde{\gamma}_1+i \tilde{\gamma}_2\right)=\mathbf{P}\left(\tilde{\gamma}_1+i \tilde{\gamma}_2\right) $$

$\tilde{\kappa}_E$ and $\tilde{\kappa}_B$ are complex representations of convergence E and B modes.

From galaxies to mass maps

Mass mapping as an inverse problem


  • Binned data: $\gamma = F^* P F \, \kappa$
  • Unbinned data: $\gamma = T^* P F \, \kappa$


$T =$ Non Equispaced Disctrete Fourier Transform (NDFT)


$$ \tilde{\boldsymbol{\kappa}}=\min _{\boldsymbol{\kappa}}\|\boldsymbol{\gamma}-\mathbf{A} \boldsymbol{\kappa}\|_{\boldsymbol{\Sigma}_n}^2 $$ with $\mathbf{A} = \boldsymbol{F} \mathbf{P} \boldsymbol{F}^* $



There is no unique and stable solution, it is an ill-posed problem.

Kaiser-Squires inversion

Advantages:
  • Simple linear operator
  • Very easy to implement in Fourier space
  • Optimal, in theory
Practical difficulties:
  • Shear measurements are discrete, noisy, and irregularly sampled

  • We actually measure the reduced shear: $g = \gamma/(1-\kappa)$

  • Masks and integration over a subset of $\mathbb{R}^2$ lead to border errors ⇒ missing data problem

  • Convergence is recoverable up to a constant ⇒ mass-sheet degeneracy problem
Kaiser-Squires inversion

Inpainting

  • Inpainting methods → techniques to fill in missing data in images by estimating the missing values from the known ones.
  • A way to deal with the missing data problem in mass mapping and with the border effects in the convergence field.
  • Implement a sparse inpainting based on the DCT (assume that the full $\kappa$-field is sparse in the DCT dictionary → apply an iterative thresholding algorithm to recover the missing data).
Inpainting

Bayesian reconstruction

  • Mass mapping problem → statistical inference problem
  • Goal: infer most probable value of $\kappa$-field given observed shear data
$$ \underbrace{p(\boldsymbol{\kappa} \mid \boldsymbol{\gamma}, \mathcal{M})}_{\text {Posterior }} \propto \underbrace{p(\gamma \mid \boldsymbol{\kappa}, \mathcal{M})}_{\text {likelihood }} \,\, \underbrace{p(\boldsymbol{\kappa} \mid \mathcal{M})}_{\text {prior }} $$ $\mathcal{M}$: cosmological model

  • Maximum A Posteriori solution: $\hat{x} = \arg \max_x \, \log p(y|x) + \log p(x)$

  • In practice: this is usually solved iteratively by alternating two steps:
    • Move toward better data fit (gradient of likelihood)
    • Enforce the prior using a proximal operator

Proximal operator
  • Acts as a "smart denoiser" by finding the closest solution that satisfies the prior.
  • Example: sparsity prior → proximal operator performs thresholding to enforce sparsity in the solution.

Wiener filter

  • Assumes prior on $\kappa$ → Gaussian random field $p_{\rm Gauss}(\kappa) = \frac{1}{\sqrt{\det 2\pi \mathbf{S}}} \exp{\left( - \frac12 \tilde{\kappa}^{\dagger} \mathbf{S}^{-1} \tilde{\kappa} \right)}$

  • Likelihood (assuming uncorrelated, Gaussian noise) → also Gaussian $p(\mathbf{\gamma} | \kappa) = \frac{1}{\sqrt{2\pi\det \mathbf{N}}} \exp{\left[ -\frac12 (\mathbf{\gamma - \mathbf{A} \mathbf{\kappa}})^{\dagger} \mathbf{N}^{-1} ( \gamma - \mathbf{A} \mathbf{\kappa} ) \right]}$

  • The Gaussian prior encodes the assumption that the fluctuations in the $\kappa$-field are well described by a Gaussian random field, with power spectrum given by the cosmological model


Convergence power spectrum
$$ \langle \tilde{\kappa}(\boldsymbol{k}) \tilde{\kappa}^*(\boldsymbol{k}') \rangle = (2\pi)^2 \delta_D(\boldsymbol{k} - \boldsymbol{k}') P_{\kappa}(k) $$ Stat. measure of the spatial distribution of the convergence field → quantifies the amplitude of the fluctuations in κ as function of their spatial scale

Wiener filter

Wiener solution of the inverse problem
$$\hat{\kappa}_{\text {wiener }}=\arg \min _\kappa\left\|\Sigma^{-1 / 2}\left(\gamma-\mathbf{F}^* \mathbf{P F} \kappa\right)\right\|_2^2+\log p_{\text {Gaussian }}(\kappa)$$

  • This solution corresponds to the maximum a posteriori (MAP) solution under the assumption of a Gaussian prior on $\kappa$, and it matches the mean of the Gaussian posterior.


Wiener reconstruction
$$\hat{\kappa}_{\text {wiener }}=\mathbf{S} \mathbf{P}^{\dagger} \left[\mathbf{P} \mathbf{S} \mathbf{P}^{\dagger} + \mathbf{N} \right]^{-1} \tilde{\gamma}$$

Sparse recovery

  • Decomposes the signal into another domain (dictionary), where it is sparse

  • Implement the wavelet transform → decomposes the signal into wavelet functions (waveforms of limited duration with an average value of zero)
Wavelets of different scales
  • Use starlet wavelets → represent well structures resembling the $\kappa$ of a DM halo (positive & isotropic)

  • The application of sparsity prior enforces a cosmological model where the matter field is a combination of spherically symmetric DM halos

MCALens

  • Models $\kappa$-field as a sum of a Gaussian and non-Gaussian component $$\boxed{\kappa = \underbrace{\kappa_{\rm NG}}_{\text{Standard Wiener filter approach}} + \underbrace{\kappa_\rm G}_{\text{Modified wavelet approach}}}$$ $$\min _{\kappa_G, \kappa_{N G}}\left\|\gamma-\mathbf{A}\left(\kappa_G+\kappa_{N G}\right)\right\|_{\Sigma_n}^2+C_{\mathrm{G}}\left(\kappa_G\right)+C_{\mathrm{NG}}\left(\kappa_{N G}\right) $$

  • MCA (morphological Component Analysis) performs an alternating minimization scheme:

    • Estimate $\mathbf{\kappa}_{\mathrm{G}}$ assuming $\mathbf{\kappa}_{\mathrm{NG}}$ is known: $$\min _{\kappa_G}\left\|\left(\gamma-\mathbf{A}\kappa_{N G}\right) -\mathbf{A}\kappa_{G} \right\|_{\Sigma_n}^2+C_{\mathrm{G}}\left(\kappa_G\right)$$
    • Estimate $\mathbf{\kappa}_{\mathrm{NG}}$ assuming $\mathbf{\kappa}_{\mathrm{G}}$ is known: $$\min _{\kappa_{NG}}\left\|\left(\gamma-\mathbf{A}\kappa_{G}\right) -\mathbf{A}\kappa_{NG} \right\|_{\Sigma_n}^2+C_{\mathrm{NG}}\left(\kappa_{NG}\right)$$

Overview of mass mapping methods

  • Wiener filter: Prior on $\kappa$ → Gaussian random field, Likelihood → Gaussian, Solution → corresponds to MAP for gaussian $\kappa$

  • Sparse recovery: Prior on $\kappa$ → sparse in wavelet basis, Enforces model where matter field is a combination of DM halos

  • MCALens: Models $\kappa$-field as sum of a Gaussian and a non-Gaussian component, Uses Morphological Component Analysis

  • Deep Learning Methods:
    • DeepMass: CNN with a U-Net-based architecture, prior from simulations
    • DeepPosterior: Probabilistic mass mapping with deep generative models, Prior from 2pt statistics modelling at large scales & Deep Learning on simulations for small scales, Sampling with Annealed HMC

Does the choice of mass-mapping method matter for cosmology?

QR mass mapping impact
A. Tersenov, L. Baumont, J.L. Starck, M. Kilbinger, doi.org/10.1051/0004-6361/202553707

Mass mapping methods:

Summary statistics

\[ \begin{array}{ll} \hline \hline \text{Method} & \text{RMSE} \downarrow \\ \hline \text{KS } & 1.1 \times 10^{-2} \\ \text{iKS } & 1.1 \times 10^{-2} \\ \text{MCALens} & 9.8 \times 10^{-3} \\ \hline \end{array} \]

Higher Order Statistics: Peak Counts

Cosmology with maps
  • Peaks: local maxima of the SNR field $\nu = \dfrac{\left(\mathcal{W} \ast \kappa \right)(\theta_{\rm ker})}{\sigma_n^{\rm filt}}$

  • Peaks trace regions where the value of $\kappa$ is high → they are associated to massive structures

Peak funtions

  • The peak function is the number of peaks in a map as a function of the SNR $\nu$

Wavelet peaks

  • We consider a multi-scale analysis compared to a single-scale analysis
  • Apply (instead of Gaussian filter) a starlet transform → allows us to represent an image $I$ as a sum of wavelet coefficient images and a coarse resolution image
Starlet transform
  • Allows for the simultaneous processing of data at different scales → efficiency
  • Each wavelet band covers a different frequency range, which leads to an almost diagonal peak count covariance matrix

Wavelet peak funtions

Starlet $\ell_1$-norm

  • New higher order summary statistic for weak lensing observables
  • Provides a fast multi-scale calculation of the full void and peak distribution
  • $\ell_1$-norm = sum of absolute values of the starlet decomposition coefficients of a weak lensing map
Wavelet l1 norm $$ \boxed{ \ell_1^{j,i} = \sum_{u=1}^{\textrm{\# coef}(\mathcal{S}_{i,j})} \left| \mathcal{S}_{j,i} [u] \right| = \| \mathcal{S}_{j,i} \|_1} \, , \,\, \mathcal{S}_{j,i} = \{ w_{j,k}/B_i w_{j,k} B_{i+1} \}$$
  • Information encoded in all pixels
  • Automatically includes peaks and voids
  • Multi-scale approach
  • Avoids the problem of defining peaks and voids

Inference with HOS

  • Use $\texttt{cosmoSLICS}$ simulations: suite designed for the analysis of WL data beyond the standard 2pt cosmic shear

  • $\texttt{cosmoSLICS}$ cover a wide parameter space in $\left[ \Omega_m, \sigma_8, w_0, h \right]$.

  • For Bayesian inference → use a Gaussian likelihood for a cosmology independent covariance, and a flat prior.

  • To have a prediction of each HOS given a new set of parameters→ employ an interpolation with Gaussian Process Regressor (GPR)



Noise & Covariance

  • We consider Gaussian, but non-white noise → the noise depends on the number of galaxies in each pixel $$\sigma_n = \dfrac{\sigma_{\epsilon}}{\sqrt{2 n_{\rm gal} A_{\rm pix}}}$$
  • We incorporate masks
  • Calculate covariance: $$ \boxed{C_{ij} = \sum_{r=1}^N \dfrac{\left( x_i^r - \mu_i \right) \left( x_j^r - \mu_j \right)}{N-1}} \, , \,\,\,\mu_i = \dfrac{1}{N} \sum_r x_i^r $$
Covariance Covariance

From data to contours

Contours
  • $\mu$: expected theoretical prediction, $d$: data array (mean over realizations of a HOS), $C$: covariance matrix

So does the choice of the mass mapping algorithm matter for the final constraints?

The (standard) mono-scale peak counts

Mono-scale peak counts

\[ \begin{array}{lcccc} \hline \hline \text{FoM} & \text{KS} & \text{iKS} & \text{MCALens} \\ \hline \hline \left(\Omega_m, h\right) & 476 & 453 & 450 \\ \left(\Omega_m, w_0\right) & 152 & 141 & 233 \\ \left(\Omega_m, \sigma_8\right) & 1323 & 1285 & 1740 \\ \left(h, w_0\right) & 55 & 63 & 87 \\ \left(h, \sigma_8\right) & 336 & 292 & 293 \\ \left(w_0, \sigma_8\right) & 75 & 72 & 124 \\ \hline \left(\Omega_m, h, w_0, \sigma_8\right) & 492 & 444 & 578 \\ \hline \end{array} \]

Wavelet multi-scale peak counts

Wavelet multi-scale peak counts

\[ \begin{array}{lcccc} \hline \hline \text{FoM} & \text{KS} & \text{iKS} & \text{MCALens} \\ \hline \hline \left(\Omega_m, h\right) & 670 & 702 & 2159 \\ \left(\Omega_m, w_0\right) & 247 & 244 & 1051 \\ \left(\Omega_m, \sigma_8\right) & 2414 & 2517 & 9039 \\ \left(h, w_0\right) & 82 & 80 & 259 \\ \left(h, \sigma_8\right) & 411 & 433 & 1335 \\ \left(w_0, \sigma_8\right) & 131 & 129 & 577 \\ \hline \left(\Omega_m, h, w_0, \sigma_8\right) & 758 & 755 & 1947 \\ \hline \end{array} \]

Where does this improvement come from?

Wavelet multi-scale peak counts Wavelet multi-scale peak counts

Summary statistics

Summary statistics

Part 2

A plug-and-play approach with fast uncertainty quantification for weak lensing mass mapping


H. Leterme, A. Tersenov, J. Fadili, and J.-L. Starck (in prep.)

Wait... what about Deep Learning for Mass Mapping?


Yep, people have tried it! ...And it works!

Example: DeepMass

Deep Mass Idea

So what's the problem?

Deep Learning for Mass Mapping?


Mass mapping method Type Accurate Flexible Fast rec. Fast UQ
Iterative Wiener Model-driven (Gaus. prior)
MCALens Model-driven (Gaus. + sparse)
DeepMass Data-driven (UNet) *
DeepPosterior Data-driven (UNet + MCMC)
MMGAN Data-driven (GAN) *
What we'd like Data-driven

Plug-and-Play Mass Mapping

Main idea:
  • Use PnP framework: replace prox by an on-the-shelf deep denoiser trained on simulations
  • $$ \kappa^{n+1} = \mathrm{prox}_{\tau g} \left( \kappa^n - \tau \nabla f(\kappa^n) \right) $$ $$ \kappa^{n+1} = F_{\Theta} \left( \kappa^n - \tau \mathbf{B} \left( \mathbf{A} \kappa^n - \gamma \right) \right) $$ $$ \kappa^{n+1} = \mathrm{Denoiser} \left( \kappa^n + \mathrm{Data \, residual} \right) $$
  • Series converges towards a fixed point $\hat{\kappa}$
  • If we choose $\mathbf{B} = \mathbf{A}^T \Sigma^{-1}$ → training phase independent of the noise covariance matrix

PnP Mass Mapping

Instead of explicitly writing down a prior, we learn what a "likely" $\kappa$ looks like from simulations and enforce it through denoising.

Implementation

Training
  • Denoiser models implemented: DRUNet & SUNet
  • Trained on κTNG and cosmoSLICS simulations, using pairs of $(\kappa_{\rm true}, \gamma_{\rm obs})$ as training data
PnP Iterations
Mass Mapping RMSE Comparison

How do we estimate uncertainties?

Step 1
  • We train a second neural network to estimate the posterior variance of $\kappa$: $$ \arg\min_{\Omega} \mathbb{E} \left[ \left\| G_{\Omega}(\gamma) - \big(\kappa - F_{\Theta}(\gamma) \big)^2 \right\|^2_2 \right] $$
  • Trained on simulated pairs $(\kappa, \gamma)$, just like the denoiser — but now focused on uncertainty.
  • This gives fast, pixel-wise error bars
  • Uncertainty estimation is fast — just one extra iteration after κ̂ is computed → adds almost no overhead

Step 2
  • Neural networks tend to produce miscalibrated uncertainties.
  • We apply conformal quantile regression (CQR) to adjust uncertainty intervals so they have guaranteed statistical coverage.
  • CQR uses a held-out calibration set to compute a correction factor for each pixel.
  • Result: Reliable, data-driven uncertainty maps — with built-in coverage guarantees.

Uncertainty bounds

Uncertainty Bounds
Error Bar Calibration
Miscoverage Rate

Conclusions

  • Investigated how different mass mapping algorithms affect cosmological inference using HOS from WL data.
  • Constructed new mass mapping algorithm based on the PnP formalism.

Results
  • With a state-of-the-art mass-mapping method (MCALens) we managed to get $\sim 157\%$ improvement in FoM over KS.
  • Increase in constraining power comes from the more accurate recovery of the smaller scales.
  • Wavelet Peak Counts: Provide tighter constraints than single-scale peak counts.
  • PnP Mass Mapping:
    • Provides a fast, flexible, and accurate mass mapping algorithm that can be used with any denoiser.
    • Provides fast and reliable uncertainties using moment networks and conformal quantile regression.

Takeaway
  • Mass-mapping Matters: Choosing an advanced mass mapping method significantly enhances constraints on cosmological parameters from HOS.

Copy of Euclid Meeting, Marseille, September 25, 2025

By Andreas Tersenov

Copy of Euclid Meeting, Marseille, September 25, 2025

  • 25