From Data to Insights
Jeremias Sulam
Trustworthy methods for modern biomedical imaging


ARISE Network Virtual Meeting

50 years ago ...

first CT scan


ELECTRIC & MUSICAL INDUSTRIES
50 years ago ...

imaging
diagnostics
complete hardware & software description
human expert diagnosis and recommendations

imaging was "simple"
... 50 years forward

Data

Compute & Hardware



Sensors & Connectivity







Research & Engineering
... 50 years forward

Data

Compute & Hardware



Sensors & Connectivity







Research & Engineering


data-driven imaging

automatic analysis and rec.

societal implications



data-driven imaging
automatic analysis and rec.
societal implications
Problems in trustworthy biomedical imaging
inverse problems
uncertainty quantification
model-agnostic interpretability
robustness
generalization
policy & regulation
Demographic fairness
hardware & protocol optimization



inverse problems
uncertainty quantification
model-agnostic interpretability
robustness
generalization
policy & regulation
Demographic fairness
hardware & protocol optimization
data-driven imaging
automatic analysis and rec.
societal implications
Problems in trustworthy biomedical imaging




Demographic fairness
Inputs (features): \(X\in\mathcal X \subset \mathbb R^d\)
Responses (labels): \(Y\in\mathcal Y = \{0,1\}\)
Sensitive attributes \(Z \in \mathcal Z \subseteq \mathbb R^k \) (sex, race, age, etc)
\((X,Y,Z) \sim \mathcal D\)
Eg: \(Z_1: \) biological sex, \(X_1: \) BMI, then
\( g(Z,X) = \boldsymbol{1}\{Z_1 = 1 \land X_1 > 35 \}: \) women with BMI > 35
Goal: ensure that \(f\) is fair w.r.t groups \(g \in \mathcal G\)
Demographic fairness
Group memberships \( \mathcal G = \{ g:\mathcal X \times \mathcal Z \to \{0,1\} \} \)
Predictor \( f(X) : \mathcal X \to [0,1]\) (e.g. likelihood of X having disease Y)
-
Group/Associative Fairness
Predictors should not have very different (error) rates among groups
[Calders et al, '09][Zliobaite, '15][Hardt et al, '16]
-
Individual Fairness
Similar individuals/patients should have similar outputs
[Dwork et al, '12][Fleisher, '21][Petersen et al, '21]
-
Causal Fairness
Predictors should be fair in a counter-factual world
[Nabi & Shpitser, '18][Nabi et al, '19][Plecko & Bareinboim, '22]
-
Multiaccuracy/Multicalibration
Predictors should be approximately unbiased/calibrated for every group
[Kim et al, '20][Hebert-Johnson et al, '18][Globus-Harris et at, 22]
Demographic fairness
-
Multiaccuracy/Multicalibration
Predictors should be approximately unbiased/calibrated for every group
[Kim et al, '20][Hebert-Johnson et al, '18][Globus-Harris et at, 22]
Demographic fairness
Observation 1:
measuring (& correcting) for MA/MC requires samples over \((X,Y,Z)\)
Definition: \(\text{MA} (f,g) = \big| \mathbb E [ g(X,Z) (f(X) - Y) ] \big| \)
\(f\) is \((\mathcal G,\alpha)\)-multiaccurate if \( \max_{g\in\mathcal G} \text{MA}(f,g) \leq \alpha \)
Definition: \(\text{MC} (f,g) = \mathbb E\left[ \big| \mathbb E [ g(X,Z) (f(X) - Y) | f(X) = v] \big| \right] \)
\(f\) is \((\mathcal G,\alpha)\)-multicalibrated if \( \max_{g\in\mathcal G} \text{MC}(f,g) \leq \alpha \)


Observation 2: That's not always possible...



Observation 2: That's not always possible...
sex and race attributes missing
- We might want to conceal \(Z\) on purpose, or might need to
We observe samples over \((X,Y)\) to obtain \(\hat Y = f(X)\) for \(Y\)
Fairness in partially observed regimes
\( \text{MSE}(f) = \mathbb E [(Y-f(X))^2 ] \)
A developer provides us with proxies \( \color{Red} \hat{g} : \mathcal X \to \{0,1\} \)
\( \text{err}(\hat g) = \mathbb P [({\color{Red}\hat g(X)} \neq {\color{blue}g(X,Z)} ] \)
[Awasti et al, '21][Kallus et al, '22][Zhu et al, '23][Bharti et al, '24]
Question
Can we (how) use \(\hat g\) to measure (and correct) \( (\mathcal G,\alpha)\)-MA/MC?
Fairness in partially observed regimes
Theorem [Bharti, Clemens-Sewall, Yi, S.]
With access to \((X,Y)\sim \mathcal D_{\mathcal{XY}}\), proxies \( \hat{\mathcal G}\) and predictor \(f\)
\[ \max_{\color{Blue}g\in\mathcal G} MC(f,{\color{blue}g}) \leq \max_{\color{red}\hat g\in \hat{\mathcal{G}} } B(f,{\color{red}\hat g}) + MC(f,{\color{red}\hat g}) \]
with \(B(f,\hat g) = \min \left( \text{err}(\hat g), \sqrt{MSE(f)\cdot \text{err}(\hat g)} \right) \)
- Practical/computable upper bounds \(\)
- Multicalibrating w.r.t \(\hat{\mathcal G}\) provably improves upper bound
[Gopalan et al. (2022)][Roth (2022)]
true error
worst-case error
Fairness in partially observed regimes

CheXpert: Predicting abnormal findings in chest X-rays
(not accessing race or biological sex)

\(f(X): \) likelihood of \(X\) having \(\texttt{pleural effusion}\)

Demographic fairness
Take-home message
- Proxies can be very useful in certifying max. fairness violations
- Can allow for simple post-processing corrections





Thanks to the ARISE Community!

ARISE 2025
By Jeremias Sulam
ARISE 2025
- 13