From Data to Insights

Jeremias Sulam

Trustworthy methods for modern biomedical imaging

50 years ago ... 

first CT scan
ELECTRIC & MUSICAL INDUSTRIES

50 years ago ... 

imaging
diagnostics

complete hardware & software description

human expert diagnosis and recommendations

imaging was "simple"

... 50 years forward 

Data

Compute & Hardware

Sensors & Connectivity

Research & Engineering

... 50 years forward 

Data

Compute & Hardware

Sensors & Connectivity

Research & Engineering

data-driven  imaging
automatic analysis and rec.
societal implications

Data

Compute & Hardware

Sensors & Connectivity

Research & Engineering

... 50 years forward 

data-driven  imaging
automatic analysis and rec.
societal implications

Problems in trustworthy biomedical imaging

inverse problems

uncertainty quantification

model-agnostic interpretability

robustness

generalization

policy & regulation

Demographic fairness

hardware & protocol optimization

inverse problems

uncertainty quantification

model-agnostic interpretability

robustness

generalization

policy & regulation

Demographic fairness

hardware & protocol optimization

data-driven  imaging
automatic analysis and rec.
societal implications

Problems in trustworthy biomedical imaging

y = A x^* + v
measurements
\hat x = \arg\min_x \frac 12 \| y - A x \|^2_2 + R(x)
reconstruction

inverse problems

y = A x^* + v
measurements
\hat x = \arg\min_x \frac 12 \| y - A x \|^2_2 + R(x)
reconstruction
= \arg\min_x ~-\log p(y|x) - \log p(x)
= \arg\max_x~ p(x|y)
\text{MAP estimate when }R(x) \propto -~p_x(x):\text{ prior}

inverse problems

\hat x = \arg\min_x \frac 12 \| y - A x \|^2_2 + R(x)

Proximal Gradient Descent:       \( x^{t+1} = \text{prox}_R \left(x^t - \eta  A^\top(Ax^t-y)\right) \)

\text{prox}_R \left( u \right) = \arg\min_x \frac 12 \|u - x\|_2^2 + R(x)
= \texttt{MAP}(x|u), \qquad u = x + v

... a denoiser

\({\color{red}f_\theta}\): off-the-shelf denoiser

[Venkatakrishnan et al., 2013; Zhang et al., 2017b; Meinhardt et al., 2017; Zhang et al., 2021; Gilton, Ongie, Willett, 2019; Kamilov et al., 2023b; Terris et al., 2023; S Hurault et al. 2021, Ongie et al, 2020; ...]

Plug and Play: Implicit Priors

\hat x = \arg\min_x \frac 12 \| y - A x \|^2_2 + R(x)

Proximal Gradient Descent:       \( x^{t+1} = \text{prox}_R \left(x^t - \eta  A^\top(Ax^t-y)\right) \)

\text{prox}_R \left( u \right) = \arg\min_x \frac 12 \|u - x\|_2^2 + R(x)
= \texttt{MAP}(x|u), \qquad u = x + v

... a denoiser

\({\color{red}f_\theta}\): off-the-shelf denoiser

[Venkatakrishnan et al., 2013; Zhang et al., 2017b; Meinhardt et al., 2017; Zhang et al., 2021; Gilton, Ongie, Willett, 2019; Kamilov et al., 2023b; Terris et al., 2023; S Hurault et al. 2021, Ongie et al, 2020; ...]

Plug and Play: Implicit Priors

Question 1)

What are these black-box functions computing? and what have they learned about the data?

Theorem [Fang, Buchanan, S.]

When will \(f_\theta(x)\) compute a \(\text{prox}_R(x)\), and for what \(R(x)\)?

Let \(f_\theta : \mathbb R^n\to\mathbb R^n\) be a network : \(f_\theta (x) = \nabla \psi_\theta (x)\),

where \(\psi_\theta : \mathbb R^n \to \mathbb R,\) convex and differentiable (ICNN).

 

Then,

1. Existence of regularizer
\(\exists ~R_\theta : \mathbb R^n \to \mathbb R\) not necessarily convex : \(f_\theta(x) \in \text{prox}_{R_\theta}(x),\)

2. Computability
We can compute \(R_{\theta}(x)\) by solving a convex problem

Learned Proximals

: revisiting PnP

\text{Let } y = x+v , \quad ~ x\sim p_x, ~~v \sim \mathcal N(0,\sigma^2I)

How do we find \(f(x) = \text{prox}_R(x)\) for the "correct" \(R(x) \propto -\log p_x(x)\)?

Learned Proximals

: revisiting PnP

Theorem [Fang, Buchanan, S.]

f^* = \arg\min_{f} \lim_{\gamma \searrow 0}~ \mathbb E_{x,y} \left[ \ell^\gamma_\text{PM}(f_\theta(y),x)\right]
f^*(y) = \arg\max_c p_{x|y}(c) \triangleq \text{prox}_{-\sigma^2\log p_x}(y)
\ell^\gamma_\text{PM} (f_\theta(y),x) = 1- \frac{c}{\gamma^{2n}} \exp\left( -\frac{\|f(y)-x\|_2^2}{\gamma} \right)
Proximal Matching Loss:

\(\gamma\)

Goal: train a denoiser \(f(y)\approx x\)

Let

Then,

a.s.

Learned Proximal Networks

\text{Sample } y = x+v,~ \text{ with } x \sim \text{Laplace}(0,1) \text{ and } v \sim \mathcal N(0,\sigma^2)

Example: recovering a prior

Fang, Buchanan & S. What's in a Prior? Learned Proximal Networks for Inverse Problems, ICLR 2024.

Learned Proximal Networks in

\hat x = \arg\min_x \frac 12 \| y - A x \|^2_2 + \hat{R}_\theta(x)

Convergence guarantees

inverse problems

Fang, Buchanan & S. What's in a Prior? Learned Proximal Networks for Inverse Problems, ICLR 2024.

Learned Proximal Networks in

\(R_\theta(x) = 0.0\)

\(R_\theta(x) = 127.37\)

\(R_\theta(x) = 274.13\)

\(R_\theta(x) = 290.45\)

Understanding the learned model provides new insights:

inverse problems

Take-home message 1

  • Learned Proximal Networks (LPNs) provide data-dependent proximal operators
  • Allow characterization of the learned priors

inverse problems

uncertainty quantification

model-agnostic interpretability

robustness

generalization

policy & regulation

Demographic fairness

hardware & protocol optimization

data-driven  imaging
automatic analysis and rec.
societal implications

Problems in trustworthy biomedical imaging

Demographic fairness

Inputs (features):            \(X\in\mathcal X \subset \mathbb R^d\)

Responses (labels):        \(Y\in\mathcal Y = \{0,1\}\)

Sensitive attributes        \(Z \in \mathcal Z \subseteq \mathbb R^k \)  (sex, race, age, etc)

                                          \((X,Y,Z) \sim \mathcal D\)

Eg: \(Z_1: \) biological sex, \(X_1: \) BMI, then

\( g(Z,X) = \boldsymbol{1}\{Z_1 = 1 \land X_1 > 35 \}: \) women with BMI > 35

Goal: ensure that \(f\) is fair w.r.t groups \(g \in \mathcal G\)

Demographic fairness

Group memberships      \( \mathcal G = \{ g:\mathcal X \times \mathcal Z \to \{0,1\} \} \)

Predictor     \( f(X) : \mathcal X \to [0,1]\)  (e.g. likelihood of X having disease Y)

  • Group/Associative Fairness
           Predictors should not have very different (error) rates among groups
         [Calders et al, '09][Zliobaite, '15][Hardt et al, '16]
  • Individual Fairness
           Similar individuals/patients should have similar outputs
           [Dwork et al, '12][Fleisher, '21][Petersen et al, '21]
  • Causal Fairness
           Predictors should be fair in a counter-factual world
          [Nabi & Shpitser, '18][Nabi et al, '19][Plecko & Bareinboim, '22]
  • Multiaccuracy/Multicalibration
           Predictors should be approximately unbiased/calibrated for every group
           [Kim et al, '20][Hebert-Johnson et al, '18][Globus-Harris et at, 22]

Demographic fairness

  • Multiaccuracy/Multicalibration
           Predictors should be approximately unbiased/calibrated for every group
           [Kim et al, '20][Hebert-Johnson et al, '18][Globus-Harris et at, 22]

Demographic fairness

Observation 1:
measuring (& correcting) for MA/MC requires samples over \((X,Y,Z)\)

Definition:            \(\text{MA} (f,g) = \big| \mathbb E [ g(X,Z) (f(X) - Y) ] \big|  \)

\(f\) is \((\mathcal G,\alpha)\)-multiaccurate if   \( \max_{g\in\mathcal G} \text{MA}(f,g) \leq \alpha \)

Definition:             \(\text{MC} (f,g) = \mathbb E\left[ \big| \mathbb E [ g(X,Z) (f(X) - Y) | f(X) = v] \big| \right]  \)

\(f\) is \((\mathcal G,\alpha)\)-multicalibrated if   \( \max_{g\in\mathcal G} \text{MC}(f,g) \leq \alpha \)

Observation 2: That's not always possible... 

Observation 2: That's not always possible... 

sex and race attributes missing

  • We might want to conceal \(Z\) on purpose, or might need to

We observe samples  over \((X,Y)\) to obtain \(\hat Y = f(X)\) for \(Y\) 

Fairness in partially observed regimes

\( \text{MSE}(f) = \mathbb E [(Y-f(X))^2 ] \)

A developer provides us with proxies  \( \color{Red} \hat{g} : \mathcal X \to \{0,1\} \)

\( \text{err}(\hat g) = \mathbb P [({\color{Red}\hat g(X)} \neq {\color{blue}g(X,Z)} ] \)

Question 2

Can we (how) use \(\hat g\) to measure (and correct) \( (\mathcal G,\alpha)\)-MA/MC?

 [Awasti et al, '21][Kallus et al, '22][Zhu et al, '23][Bharti et al, '24]

Fairness in partially observed regimes

Theorem [Bharti, Clemens-Sewall, Yi, S.]

With access to \((X,Y)\sim \mathcal D_{\mathcal{XY}}\), proxies \( \hat{\mathcal G}\) and predictor \(f\)

 

\[ \max_{\color{Blue}g\in\mathcal G} MC(f,{\color{blue}g}) \leq \max_{\color{red}\hat g\in \hat{\mathcal{G}} } B(f,{\color{red}\hat g}) + MC(f,{\color{red}\hat g}) \]

 

with \(B(f,\hat g) = \min \left( \text{err}(\hat g), \sqrt{MSE(f)\cdot \text{err}(\hat g)} \right) \)

  • Practical/computable upper bounds \(\)
  • Multicalibrating w.r.t \(\hat{\mathcal G}\) provably improves upper bound
    [Gopalan et al. (2022)][Roth (2022)]

Fairness in partially observed regimes

CheXpert:  Predicting abnormal findings in chest X-rays
(not accessing race or biological sex)

\(f(X): \) likelihood of \(X\) having  \(\texttt{pleural effusion}\)

Demographic fairness

Take-home message 2

  • Proxies can be very useful in certifying max. fairness violations
  • Can allow for simple post-processing corrections

inverse problems

uncertainty quantification

model-agnostic interpretability

robustness

generalization

policy & regulation

Demographic fairness

hardware & protocol optimization

data-driven  imaging
automatic analysis and rec.
societal implications

Problems in trustworthy biomedical imaging

\((X,Y) \in \mathcal X \times \mathcal Y\)

\((X,Y) \sim P_{X,Y}\)

\(\hat{Y} = f(X) : \mathcal X \to \mathcal Y\)

Setting:

  • What features are important for this prediction?

  • What does importance mean, exactly?

model-agnostic interpretability

Is the presence of \(\color{Blue}\texttt{edema}\) important for \(\hat Y = \text{lung opacity}\)?

How can we explain black-box predictors with semantic features?

Is the presence of \(\color{magenta}\texttt{devices}\) important for \(\hat Y = \texttt{lung opacity}\), given that there is \(\color{blue}\texttt{edema}\) in the image?

model-agnostic interpretability

lung opacity
cardiomegaly
fracture
no findding

Is the presence of \(\color{Blue}\texttt{edema}\) important for \(\hat Y = \text{lung opacity}\)?

How can we explain black-box predictors with semantic features?

Is the presence of \(\color{magenta}\texttt{devices}\) important for \(\hat Y = \texttt{lung opacity}\), given that there is \(\color{blue}\texttt{edema}\) in the image?

model-agnostic interpretability

lung opacity
cardiomegaly
fracture
no findding

Post-hoc Interpretability Methods

Interpretable by

construction

Is the presence of \(\color{Blue}\texttt{edema}\) important for \(\hat Y = \text{lung opacity}\)?

How can we explain black-box predictors with semantic features?

Is the presence of \(\color{magenta}\texttt{devices}\) important for \(\hat Y = \texttt{lung opacity}\), given that there is \(\color{blue}\texttt{edema}\) in the image?

model-agnostic interpretability

lung opacity
cardiomegaly
fracture
no findding

Interpretable by

construction

Post-hoc Interpretability Methods

Semantic Interpretability of classifiers

Concept Bank:     \(C = [c_1, c_2, \dots, c_m] \in \mathbb R^{d\times m}\)

Embeddings:     \(H = f(X) \in \mathbb R^d\)

Semantics:     \(Z = C^\top H \in R^m\)

Semantic Interpretability of classifiers

Question 3 (last!)

How can we provide (local) notions of importance that allow for (efficient) statistical testing with valid guarantees (Type 1 error/FDR control)

Concept Bank:     \(C = [c_1, c_2, \dots, c_m] \in \mathbb R^{d\times m}\)

Concept Activation Vectors

(Kim et al, 2018)

\(c_\text{cute}\)

Semantic Interpretability of classifiers

Vision-language models

(CLIP, BLIP, etc... )

Concept Bank:     \(C = [c_1, c_2, \dots, c_m] \in \mathbb R^{d\times m}\)

Semantic Interpretability of classifiers

Vision-language models

(training)

[Radford et al, 2021]

Semantic Interpretability of classifiers

[Bhalla et al, "Splice", 2024]

Concept Bottleneck Models (CMBs)

[Koh et al '20, Yang et al '23, Yuan et al '22 ]

  • Need to engineer a (large) concept bank
  • Performance hit w.r.t. original predictor

\(\tilde{Y} = \hat w^\top Z\)

\(\hat w_j\) is the importance of the \(j^{th}\) concept

Precise notions of semantic importance

\(C = \{\text{``cute''}, \text{``whiskers''}, \dots \}\)

Global Importance

\(H^G_{0,j} : \hat{Y} \perp\!\!\!\perp Z_j \)

Global Conditional Importance

\(H^{GC}_{0,j} : \hat{Y} \perp\!\!\!\perp Z_j | Z_{-j}\)

Precise notions of semantic importance

Global Importance

\(C = \{\text{``cute''}, \text{``whiskers''}, \dots \}\)

\(H^G_{0,j} : g(f(X)) \perp\!\!\!\perp c_j^\top f(X) \)

Global Conditional Importance

\(H^{GC}_{0,j} : g(f(X)) \perp\!\!\!\perp c_j^\top f(X) | C_{-j}^\top f(X)\)

\(H^G_{0,j} : \hat{Y} \perp\!\!\!\perp Z_j \)

\(H^{GC}_{0,j} : \hat{Y} \perp\!\!\!\perp Z_j | Z_{-j}\)

Precise notions of semantic importance

"The classifier (its distribution) does not change if we condition

on concepts \(S\) vs on concepts \(S\cup\{j\} \)"

\(C = \{\texttt{cute}, \texttt{whiskers}, \dots \}\)

Local Conditional Importance

\[H^{j,S}_0:~ g({\tilde H_{S \cup \{j\}}}) \overset{d}{=} g(\tilde H_S), \qquad \tilde H_S \sim P_{H|Z_S = C_S^\top f(x)} \]

Tightly related to Shapley values

[Teneggi et al, The Shapley Value Meets Conditional Independence Testing, 2023]

Precise notions of semantic importance

\(\hat{Y}_\text{gas pump}\)

\(Z_S\cup Z_{j}\)

\(Z_{S}\)

\(Z_j=\)

Local Conditional Importance

\[H^{j,S}_0:~ g({\tilde H_{S \cup \{j\}}}) \overset{d}{=} g(\tilde H_S), \qquad \tilde H_S \sim P_{H|Z_S = C_S^\top f(x)} \]

\(\tilde{Z}_S = [z_\text{text}, z_\text{old}, Z_\text{dispenser}, Z_\text{trumpet}, Z_\text{fire}, \dots ] \)

\(S\)

\(\tilde{Z}_{S\cup j} = [z_\text{text}, z_\text{old}, z_\text{dispenser}, Z_\text{trumpet}, Z_\text{Fire}, \dots ] \)

\(S\)

\(j\)

Precise notions of semantic importance

\(\hat{Y}_\text{gas pump}\)

\(\hat{Y}_\text{gas pump}\)

\(Z_S\cup Z_{j}\)

\(Z_{S}\)

\(Z_S\cup Z_{j}\)

\(Z_{S}\)

Local Conditional Importance

\(Z_j=\)

\(Z_j=\)

\[H^{j,S}_0:~ g({\tilde H_{S \cup \{j\}}}) \overset{d}{=} g(\tilde H_S), \qquad \tilde H_S \sim P_{H|Z_S = C_S^\top f(x)} \]

\(\tilde{Z}_S = [z_\text{text}, z_\text{old}, Z_\text{dispenser}, Z_\text{trumpet}, Z_\text{fire}, \dots ] \)

\(\tilde{Z}_{S\cup j} = [z_\text{text}, z_\text{old}, Z_\text{dispenser}, z_\text{trumpet}, Z_\text{Fire}, \dots ] \)

\(S\)

\(S\)

\(j\)

Testing by betting 

\(H^G_{0,j} : \hat{Y} \perp\!\!\!\perp Z_j     \iff      P_{\hat{Y},Z_j} = P_{\hat{Y}} \times P_{Z_j}\)

Testing importance via two-sample tests

\(H^{GC}_{0,j} : \hat{Y} \perp\!\!\!\perp Z_j | Z_{-j}     \iff      P_{\hat{Y}Z_jZ_{-j}} = P_{\hat{Y}\tilde{Z}_j{Z_{-j}}}\)

\(\tilde{Z_j} \sim P_{Z_j|Z_{-j}}\)

[Shaer et al, 2023]

[Teneggi et al, 2023]

\[H^{j,S}_0:~ g({\tilde H_{S \cup \{j\}}}) \overset{d}{=} g(\tilde H_S), \qquad \tilde H_S \sim P_{H|Z_S = C_S^\top f(x)} \]

Testing by betting

Goal: Test a null hypothesis \(H_0\) at significance level \(\alpha\)

Standard testing by p-values

Collect data, then test, and reject if \(p \leq \alpha\)

[Grünwald 2019, Shafer 2021, Shaer et al. 2023, Shekhar and Ramdas 2023. Podkopaev et al., 2023]

Testing by betting

Goal: Test a null hypothesis \(H_0\) at significance level \(\alpha\)

Online testing by e-values

Any-time valid inference, track and reject when \(e\geq 1/\alpha\)

  • Consider a wealth process
    \(K_0 = 1;\)
    \(\text{for}~ t = 1, \dots \\ \quad K_t = K_{t-1}(1+\kappa_t v_t)\)

Fair game (test martingale): \(~~\mathbb E_{H_0}[\kappa_t | \text{Everything seen}_{t-1}] = 0\)

\(v_t \in (0,1):\) betting fraction

\(\kappa_t \in [-1,1]\) payoff

[Grünwald 2019, Shafer 2021, Shaer et al. 2023, Shekhar and Ramdas 2023. Podkopaev et al., 2023]

\(\mathbb P_{H_0}[\exists t \in \mathbb N: K_t \leq 1/\alpha]\leq \alpha\)

Testing by betting

Goal: Test a null hypothesis \(H_0\) at significance level \(\alpha\)

Online testing by e-values

Any-time valid inference, track and reject when \(e\geq 1/\alpha\)

  • Consider a wealth process
    \(K_0 = 1;\)
    \(\text{for}~ t = 1, \dots \\ \quad K_t = K_{t-1}(1+\kappa_t v_t)\)

Fair game (test martingale): \(~~\mathbb E_{H_0}[\kappa_t | \text{Everything seen}_{t-1}] = 0\)

\(v_t \in (0,1):\) betting fraction

\(\kappa_t \in [-1,1]\) payoff

[Grünwald 2019, Shafer 2021, Shaer et al. 2023, Shekhar and Ramdas 2023. Podkopaev et al., 2023]

\(\mathbb P_{H_0}[\exists t \in \mathbb N: K_t \leq 1/\alpha]\leq \alpha\)

     Data efficient 

     Rank induced by rejection time

Testing by betting

Goal: Test a null hypothesis \(H_0\) at significance level \(\alpha\)

Online testing by e-values

Any-time valid inference, track and reject when \(e\geq 1/\alpha\)

  • Consider a wealth process
    \(K_0 = 1;\)
    \(\text{for}~ t = 1, \dots \\ \quad K_t = K_{t-1}(1+\kappa_t v_t)\)

Fair game (test martingale): \(~~\mathbb E_{H_0}[\kappa_t | \text{Everything seen}_{t-1}] = 0\)

\(v_t \in (0,1):\) betting fraction

\(\kappa_t \in [-1,1]\) payoff

[Grünwald 2019, Shafer 2021, Shaer et al. 2023, Shekhar and Ramdas 2023. Podkopaev et al., 2023]

     Data efficient 

     Rank induced by rejection time

\(\mathbb P_{H_0}[\exists t \in \mathbb N: K_t \leq 1/\alpha]\leq \alpha\)

Testing by betting via SKIT (Podkopaev et al., 2023)

Online testing by e-values

\(v_t \in (0,1):\) betting fraction

\(H_0: ~ P = Q\)

\(\kappa_t = \text{tanh}({\color{teal}\rho(X_t)} - {\color{teal}\rho(Y_t)})\)

Payoff function

\({\color{black}\text{MMD}(P,Q)} : \text{ Maximum Mean Discrepancy}\)

\({\color{teal}\rho} = \underset{\rho\in \mathcal R:\|\rho\|_\mathcal R\leq 1}{\arg\sup} ~\mathbb E_P [\rho(X)] - \mathbb E_Q[\rho(Y)]\)

\( K_t = K_{t-1}(1+\kappa_t v_t)\)

     Data efficient 

     Rank induced by rejection time

\(X_t \sim P, Y_t \sim Q\)

Results: Local Testing

Important Semantic Concepts
(Reject \(H_0\))

Unimportant Semantic Concepts

(Fail to reject)

 

  • Type 1 error control
  • False discovery rate control

rejection time

rejection rate

0.0

1.0

Results: Local Testing

CheXpert: validating BiomedVLP 

What concepts does BiomedVLP find important to predict                                 ? 

lung opacity

CheXpert: validating BiomedVLP 

What concepts does BiomedVLP find important to predict                                 ? 

lung opacity

Take-home message 3

  • Model-agnostic interpretability can be posed as local hypothesis tests
  • Online efficient testing procedures for statistical control 

Results: RSNA Brain CT Hemorrhage

Hemorrhage
No Hemorrhage
Hemorrhage
Hemorrhage
intraparenchymal
subdural
subarachnoid
intraventricular
epidural
intraparenchymal
subarachnoid
intraventricular
epidural
subdural
intraparenchymal
subarachnoid
subdural
epidural
intraventricular
intraparenchymal
subarachnoid
intraventricular
epidural
subdural
(+)
(-)
(-)
(-)
(-)
(+)
(-)
(+)
(-)
(-)
(+)
(+)
(-)
(-)
(-)
(-)
(-)
(-)
(-)
(-)

Take-home message 3

  • Model-agnostic interpretability can be posed as local hypothesis tests
  • Online efficient testing procedures for statistical control 

Results:     Imagenette

Semantic comparison of vision-language models

inverse problems

uncertainty quantification

model-agnostic interpretability

robustness

generalization

policy & regulation

Demographic fairness

hardware & protocol optimization

data-driven  imaging
automatic analysis and rec.
societal implications

Problems in trustworthy biomedical imaging

model-agnostic interpretability

uncertainty quantification

policy & regulation

hardware & protocol optimization

inverse problems

Demographic fairness

robustness

generalization

[Bharti et al, Neurips '23 ]
[Bharti et al, arXiv '25 ]
[Sulam et al, Neurips '20 ]
[Muthukumar et al, COLT '23 ]
[Pal et al, Neurips '24]
[Muthukumar et al, SIMODS '23]
[Pal et al, TMLR '24]
[Teneggi et al, TMLR '22]
[Teneggi et al, TPAMI '22]
[Teneggi et al, Neurips '24]
[Bharti et al, CPAL '25]
[Teneggi et al, ICML '23]
[Teneggi et al, arXiv '25]
[Muthukumar et al, CVPR '25]
[Lai et al, MICCAI '20]
[Fang et al, MIA '20]
[Xu et al, Nat.Met'20]
[Fang et al, ICLR '24]
data-driven  imaging
automatic analysis and rec.
societal implications
[Wang et al, CPAL '25]
[Wang et al, Patterns '25]

uncertainty quantification

policy & regulation

hardware & protocol optimization

robustness

generalization

data-driven  imaging
automatic analysis and rec.
societal implications

inverse problems

Societal Constraints

model-agnostic interpretability

Formal frameworks for interpretability for decision making (in medical imaging)
Understanding social implications of algorithms in the wild
Efficient and robust diffusion models

uncertainty quantification

robustness

generalization

uncertainty quantification

Many more open questions...

Thank you for hosting me

Learned Proximal Networks

Example 2: a prior for CT

Learned Proximal Networks

Example 2: a prior for CT

Learned Proximal Networks

Example 2: a prior for CT

Learned Proximal Networks

Example 2: a prior for CT

Learned Proximal Networks

\(R(\tilde{x})\)

Example 2: priors for images

Learned Proximal Networks

Example 2: priors for images

x^{t+1} = \text{prox}_{\hat R} \left(x^t - \eta A^T(Ax^t - y)\right)
\hat x = \arg\min_x \frac 12 \| y - A x \|^2_2 + \hat{R}(x)

Learned Proximal Networks

via

Convergence Guarantees

Theorem (PGD with Learned Proximal Networks)

x^{t+1} = \text{prox}_{\hat R} \left(x^t - \eta A^T(Ax^t - y)\right)
\hat x = \arg\min_x \frac 12 \| y - A x \|^2_2 + \hat{R}(x)

Let \(f_\theta = \text{prox}_{\hat{R}} {\color{grey}\text{ with } \alpha>0}, \text{ and } 0<\eta<1/\sigma_{\max}(A) \) with smooth activations

\text{Then } \exists x^* : \lim_{k\to\infty} x^t = x^* \text{ and }
f_\theta(x^* - \eta A^T(Ax^*-y)) = x^*

(Analogous results hold for ADMM)

Learned Proximal Networks

Convergence guarantees for PnP

in a box

Denoiser

diffusion

Measurements

\[y = Ax + \epsilon,~\epsilon \sim \mathcal{N}(0, \sigma^2\mathbb{I})\]

\[\hat{x} = F(y) \sim \mathcal{P}_y\]

Hopefully \(\mathcal{P}_y \approx p(x \mid y)\), but not needed!

Reconstruction

Question 3)

How much uncertainty is there in the samples \(\hat x \sim \mathcal P_y?\)

Question 4)

How far will the samples \(\hat x \sim \mathcal P_y\) be from the true \(x\)?

Conformal guarantees for diffusion models

Lemma

Given \(m\) samples from \(\mathcal P_y\), let

\[\mathcal{I}(y)_j = \left[ Q_{y_j}\left(\frac{\lfloor(m+1)\alpha/2\rfloor}{m}\right), Q_{y_j}\left(\frac{\lceil(m+1)(1-\alpha/2)\rceil}{m}\right)\right]\]

Then \(\mathcal I(y)\) provides entriwise coverage for a new sample \(\hat x \sim \mathcal P_y\), i.e.

\[\mathbb{P}\left[\hat{x}_j \in \mathcal{I}(y)_j\right] \geq 1 - \alpha\]

\(0\)

\(1\)

low: \( l(y) \)

\(\mathcal{I}(y)\)

up: \( u(y) \)

Question 3)

How much uncertainty is there in the samples \(\hat x \sim \mathcal P_y?\)

(distribution free)
 cf [Feldman, Bates, Romano, 2023]

\(y\)

lower

upper

intervals
\(|\mathcal I(y)_j|\)

Conformal guarantees for diffusion models

\(0\)

\(1\)

ground-truth is

contained

\(\mathcal{I}(y_j)\)

\(x_j\)

Conformal guarantees for diffusion models

Question 4)

How far will the samples \(\hat x \sim \mathcal P_y\) be from the true \(x\)?

Conformal guarantees for diffusion models

[Angelopoulos et al, 2022]

[Angelopoulos et al, 2022]

Risk Controlling Prediction Set

      For risk level \(\epsilon\), failure probability \(\delta\), \(\mathcal{I}(y_j) \) is a RCPS if

\[\mathbb{P}\left[\mathbb{E}\left[\text{fraction of pixels not in intervals}\right] \leq \epsilon\right] \geq 1 - \delta\]

[Angelopoulos et al, 2022]

Question 4)

How far will the samples \(\hat x \sim \mathcal P_y\) be from the true \(x\)?

\(0\)

\(1\)

ground-truth is

contained

\(\mathcal{I}(y_j)\)

\(x_j\)

Conformal guarantees for diffusion models

[Angelopoulos et al, 2022]

ground-truth is

contained

\(0\)

\(1\)

\(\mathcal{I}(y_j)\)

\(\lambda\)

\(x_j\)

Procedure:

\[\hat{\lambda} = \inf\{\lambda \in \mathbb{R}:~ \hat{\text{risk}}_{(\mathcal S_{cal})} \leq \epsilon,~\forall \lambda' \geq \lambda \}\]

[Angelopoulos et al, 2022]

single \(\lambda\) for all \(\mathcal I(y_j)\)!

Risk Controlling Prediction Set

      For risk level \(\epsilon\), failure probability \(\delta\), \(\mathcal{I}(y_j) \) is a RCPS if

\[\mathbb{P}\left[\mathbb{E}\left[\text{fraction of pixels not in intervals}\right] \leq \epsilon\right] \geq 1 - \delta\]

[Angelopoulos et al, 2022]

Question 4)

How far will the samples \(\hat x \sim \mathcal P_y\) be from the true \(x\)?

\(\mathcal{I}_{\bm{\lambda}}(y)_j = [l_\text{low,j} - \lambda, l_\text{up,j} + \lambda]\)

Conformal guarantees for diffusion models

\(K\)-RCPS: High-dimensional Risk Control

 

\[\tilde{{\lambda}}_K = \underset{\lambda \in \mathbb R^K}{\arg\min}~\sum_{k \in [K]}\lambda_k~\quad \text{s.t. }\quad \mathcal I_{\lambda_j}(y) : \text{RCPS}\]

scalar \(\lambda \in \mathbb{R}\)

vector \(\bm{\lambda} \in \mathbb{R}^d\)

\(\mathcal{I}_{\lambda}(y)_j = [\text{low}_j - \lambda, \text{up}_j + \lambda]\)

\(\mathcal{I}_{\bm{\lambda}}(y)_j = [\text{low}_j - \lambda_j, \text{up}_j + \lambda_j]\)

\(\rightarrow\)

\(\rightarrow\)

Procedure:

1. Find anchor point

\[\tilde{\bm{\lambda}}_K = \underset{\bm{\lambda}}{\arg\min}~\sum_{k \in [K]}\lambda_k~\quad\text{s.t.}~~~\hat{\text{risk}}^+(\bm{\lambda})_{(S_{opt})} \leq \epsilon\]

2. Choose

\[\hat{\beta} = \inf\{\beta \in \mathbb{R}:~\hat{\text{risk}}_{S_{cal}}^+(\tilde{\bm{\lambda}}_K + \beta'\bf{1}) \leq \epsilon,~\forall~ \beta' \geq \beta\}\]

\(\tilde{\bm{\lambda}}_K\)

Conformal guarantees for diffusion models

\(K\)-RCPS: High-dimensional Risk Control

 

\[\tilde{{\lambda}}_K = \underset{\lambda \in \mathbb R^K}{\arg\min}~\sum_{k \in [K]}\lambda_k~\quad \text{s.t. }\quad \mathcal I_{\lambda_j}(y) : \text{RCPS}\]

scalar \(\lambda \in \mathbb{R}\)

vector \(\bm{\lambda} \in \mathbb{R}^d\)

\(\rightarrow\)

\(\rightarrow\)

Procedure:

1. Find anchor point

\[\tilde{\bm{\lambda}}_K = \underset{\bm{\lambda}}{\arg\min}~\sum_{k \in [K]}\lambda_k~\quad\text{s.t.}~~~\hat{\text{risk}}^+(\bm{\lambda})_{(S_{opt})} \leq \epsilon\]

2. Choose

\[\hat{\beta} = \inf\{\beta \in \mathbb{R}:~\hat{\text{risk}}_{S_{cal}}^+(\tilde{\bm{\lambda}}_K + \beta'\bf{1}) \leq \epsilon,~\forall~ \beta' \geq \beta\}\]

\(\hat{R}^{\gamma}(\bm{\lambda}_{S_{opt}})\leq \epsilon\)

Guarantee:    \(\mathcal{I}_{\bm{\lambda}_K,\hat{\beta}}(y)_j \) are \((\epsilon,\delta)\)-RCPS 

\(\tilde{\bm{\lambda}}_K\)

\(\mathcal{I}_{\lambda}(y)_j = [\text{low}_j - \lambda, \text{up}_j + \lambda]\)

\(\mathcal{I}_{\bm{\lambda}}(y)_j = [\text{low}_j - \lambda_j, \text{up}_j + \lambda_j]\)

\(\hat{\lambda}_K\)

conformalized uncertainty maps

\(K=4\)

\(K=8\)

\[\mathbb{P}\left[\mathbb{E}\left[\text{fraction of pixels not in intervals}\right] \leq \epsilon\right] \geq 1 - \delta\]

Conformal guarantees for diffusion models

c.f. [Kiyani et al, 2024]
Teneggi, Tivnan, Stayman, S. How to trust your diffusion model: A convex optimization approach to conformal risk control. ICML 2023