RankSEG: A Consistent Rankingbased Framework for Segmentation
Ben Dai
The Chinese University of Hong Kong
From Classification to Segmentation
 Recall Classification problem
 Data: \( \mathbf{X} \in \mathbb{R}^d \to Y \in \{0,1\} \)
 Decision function: \( \delta(\mathbf{X}): \mathbb{R}^d \to \{0,1\} \)
 Evaluation:
$$ \text{Acc}( \delta) = \mathbb{E}( \mathbf{1}( Y = \delta(\mathbf{X}) )) $$
What is the "best" decision function? Bayes Rule!
$$ \delta^* = \argmax_{\delta} \ \text{Acc}(\delta) \ \to \ \delta^*(\mathbf{x}) = \mathbf{1}( p(\mathbf{x}) \geq 0.5 ) $$
Plugin rule:
$$\widehat{\delta}(\mathbf{x}) = \mathbf{1}( q(\mathbf{x}) \geq 0.5 ), \quad q(\mathbf{x}) \text{ is an estimator of } p(\mathbf{x}) $$
$$ p(\mathbf{x}) = \mathbb{P}(Y=1\mathbf{X}=\mathbf{x})$$
Segmentation
Long, et al. (2015) Fully convolutional networks for semantic segmentation
 Input: \(\mathbf{X} \in \mathbb{R}^d\)
 Outcome: \(\mathbf{Y} \in \{0,1\}^d\)

Segmentation function:
 \( \pmb{\delta}: \mathbb{R}^d \to \{0,1\}^d\)
 \( \pmb{\delta}(\mathbf{X}) = ( \delta_1(\mathbf{X}), \cdots, \delta_d(\mathbf{X}) )^\intercal \)

Predicted segmentation:
 \( I(\pmb{\delta}(\mathbf{X})) = \{j: \delta_j(\mathbf{X}) = 1 \}\)
Segmentation
Long, et al. (2015) Fully convolutional networks for semantic segmentation
 Input: \(\mathbf{X} \in \mathbb{R}^d\)
 Outcome: \(\mathbf{Y} \in \{0,1\}^d\)

Segmentation function:
 \( \pmb{\delta}: \mathbb{R}^d \to \{0,1\}^d\)
 \( \pmb{\delta}(\mathbf{X}) = ( \delta_1(\mathbf{X}), \cdots, \delta_d(\mathbf{X}) )^\intercal \)

Predicted segmentation:
 \( I(\pmb{\delta}(\mathbf{X})) = \{j: \delta_j(\mathbf{X}) = 1 \}\)
Segmentation
Long, et al. (2015) Fully convolutional networks for semantic segmentation
 Input: \(\mathbf{X} \in \mathbb{R}^d\)
 Outcome: \(\mathbf{Y} \in \{0,1\}^d\)

Segmentation function:
 \( \pmb{\delta}: \mathbb{R}^d \to \{0,1\}^d\)
 \( \pmb{\delta}(\mathbf{X}) = ( \delta_1(\mathbf{X}), \cdots, \delta_d(\mathbf{X}) )^\intercal \)

Predicted segmentation:
 \( I(\pmb{\delta}(\mathbf{X})) = \{j: \delta_j(\mathbf{X}) = 1 \}\)
Goal: learn segmentation decision function \( \pmb{\delta} \)
Evaluation
The Dice and IoU metrics are introduced and widely used in the literature:
Evaluation
The Dice and IoU metrics are introduced and widely used in the literature:
Evaluation
The Dice and IoU metrics are introduced and widely used in the literature:
Existing Framework
 Given training data \( \{\mathbf{x}_i, \mathbf{y}_i\} _{i=1, \cdots, n}\), most existing methods characterize segmentation as a classification problem:
Existing Framework
 Given training data \( \{\mathbf{x}_i, \mathbf{y}_i\} _{i=1, \cdots, n}\), most existing methods characterize segmentation as a classification problem:
Classificationbased loss
Existing Framework
 Given training data \( \{\mathbf{x}_i, \mathbf{y}_i\} _{i=1, \cdots, n}\), most existing methods characterize segmentation as a classification problem:
Diceapproximating loss
Bayes Segmentation Rule
We discuss Dicesegmentation at the population level, and present its Bayes segmentation rule akin to the Bayes classifier.
$$ p_j(\mathbf{x}) := \mathbb{P}(Y_j = 1  \mathbf{X} = \mathbf{x})$$
To begin with, we introduce some notations:
 Segmentation probability for the \(j\)th pixel
 \({B}_j(\mathbf{x})\) is a Bernoulli random variable with the success probability \(p_{j}(\mathbf{x})\)
$$ \pmb{\delta}^* = \text{argmax}_{\pmb{\delta}} \ \text{Dice}_\gamma ( \pmb{\delta})$$
Bayes Segmentation Rule
We discuss Dicesegmentation at the population level, and present its Bayes segmentation rule akin to the Bayes classifier.
$$ p_j(\mathbf{x}) := \mathbb{P}(Y_j = 1  \mathbf{X} = \mathbf{x})$$
To begin with, we introduce some notations:
 Segmentation probability for the \(j\)th pixel
 \({B}_j(\mathbf{x})\) is a Bernoulli random variable with the success probability \(p_{j}(\mathbf{x})\)
$$ \pmb{\delta}^* = \text{argmax}_{\pmb{\delta}} \ \text{Dice}_\gamma ( \pmb{\delta})$$
B 
Bayes Segmentation Rule
Theorem 1 (Dai and Li, 2023). A segmentation rule \(\pmb{\delta}^*\) is a global maximizer of \(\text{Dice}_\gamma(\pmb{\delta})\) if and only if it satisfies that
\( \tau^*(\mathbf{x}) \) is called optimal segmentation volume, defined as
where \(J_\tau(\mathbf{x})\) is the index set of the \(\tau\)largest probabilities, \(\Gamma(\mathbf{x}) = \sum_{j=1}^d {B}_{j}(\mathbf{x})\), and \( {\Gamma}_{ j}(\mathbf{x}) = \sum_{j' \neq j} {B}_{j'}(\mathbf{x})\) are Poissonbinomial random variables.
Bayes Segmentation Rule
Theorem 1 (Dai and Li, 2023). A segmentation rule \(\pmb{\delta}^*\) is a global maximizer of \(\text{Dice}_\gamma(\pmb{\delta})\) if and only if it satisfies that
\( \tau^*(\mathbf{x}) \) is called optimal segmentation volume, defined as
where \(J_\tau(\mathbf{x})\) is the index set of the \(\tau\)largest probabilities, \(\Gamma(\mathbf{x}) = \sum_{j=1}^d {B}_{j}(\mathbf{x})\), and \( {\Gamma}_{ j}(\mathbf{x}) = \sum_{j' \neq j} {B}_{j'}(\mathbf{x})\) are Poissonbinomial random variables.
Bayes Segmentation Rule
The Dice measure is separable w.r.t. \(j\)
Bayes Segmentation Rule
Bayes Segmentation Rule
Bayes Segmentation Rule
Theorem 1 (Dai and Li, 2023). A segmentation rule \(\pmb{\delta}^*\) is a global maximizer of \(\text{Dice}_\gamma(\pmb{\delta})\) if and only if it satisfies that
\( \tau^*(\mathbf{x}) \) is called optimal segmentation volume, defined as
Obs: both the Bayes segmentation rule \(\pmb{\delta}^*(\mathbf{x})\) and the optimal volume function \(\tau^*(\mathbf{x})\) are achievable when the conditional probability \(\mathbf{p}(\mathbf{x}) = ( p_1(\mathbf{x}), \cdots, p_d(\mathbf{x}) )^\intercal\) is wellestimated
Bayes Segmentation Rule
Theorem 1 (Dai and Li, 2023). A segmentation rule \(\pmb{\delta}^*\) is a global maximizer of \(\text{Dice}_\gamma(\pmb{\delta})\) if and only if it satisfies that
\( \tau^*(\mathbf{x}) \) is called optimal segmentation volume, defined as
where \(J_\tau(\mathbf{x})\) is the index set of the \(\tau\)largest probabilities, \(\Gamma(\mathbf{x}) = \sum_{j=1}^d {B}_{j}(\mathbf{x})\), and \( {\Gamma}_{ j}(\mathbf{x}) = \sum_{j' \neq j} {B}_{j'}(\mathbf{x})\) are Poissonbinomial random variables.
RankDice inspired by Thm 1 (plugin rule)

Ranking the conditional probability \(p_j(\mathbf{x})\)
Bayes Segmentation Rule
Theorem 1 (Dai and Li, 2023+). A segmentation rule \(\pmb{\delta}^*\) is a global maximizer of \(\text{Dice}_\gamma(\pmb{\delta})\) if and only if it satisfies that
\( \tau^*(\mathbf{x}) \) is called optimal segmentation volume, defined as
RankDice inspired by Thm 1

Ranking the conditional probability \(p_j(\mathbf{x})\)

searching for the optimal volume of the segmented features \(\tau(\mathbf{x})\)
RankDice: Framework
RankDice: Framework
RankDice: Framework
RankDice: Framework
RankDice: Algo
 Fast evaluation of Poissonbinomial r.v.
 Quick search for \(\tau \in \{0,1,\cdots, d\}\)
Note that (6) can be rewritten as:
RankDice: Algo
In practice, the DFT–CF method is generally recommended for computing. The RF1 method can also been used when n < 1000, because there is not much difference in computing time from the DFT–CF method. The RNA method is recommended when n > 2000 and the cdf needs to be evaluated many times. As shown in the numerical study, the RNA method can approximate the cdf well, when n is large, and is more computationally efficient.
Hong. (2013) On computing the distribution function for the Poisson binomial distribution
RankDice: Algo
In practice, the DFT–CF method is generally recommended for computing. The RF1 method can also been used when n < 1000, because there is not much difference in computing time from the DFT–CF method. The RNA method is recommended when n > 2000 and the cdf needs to be evaluated many times. As shown in the numerical study, the RNA method can approximate the cdf well, when n is large, and is more computationally efficient.
Hong. (2013) On computing the distribution function for the Poisson binomial distribution
 Fast evaluation of Poissonbinomial r.v.
 Quick search for \(\tau \in \{0,1,\cdots, d\}\)
RankDice: AlgoEarly Stop
Lemma 3 (Dai and Li, 2023). If \(\sum_{s=1}^{\tau} \widehat{q}_{j_s}(\mathbf{x}) \geq (\tau + \gamma + d) \widehat{q}_{j_{\tau+1}}(\mathbf{x})\), then \(\bar{\pi}_\tau(\mathbf{x}) \geq \bar{\pi}_{\tau'}(\mathbf{x})\) for all \(\tau' >\tau\)
Early stop!
RankDice: AlgoTRNA
RankDice: AlgoTRNA
RankDice: AlgoTRNA
It is unnecessary to compute all \(\mathbb{P}(\widehat{\Gamma}_{j}(\mathbf{x}) = l)\) and \(\mathbb{P}(\widehat{\Gamma}(\mathbf{x}) = l)\) for \(l=1, \cdots, d\), since they are negligibly close to zero when \(l\) is too small or too large.
RankDice: AlgoTRNA
Truncation!
RankDice: AlgoTRNA
\(\widehat{\sigma}^2(\mathbf{x}) = \sum_{j=1}^d \widehat{q}_j(\mathbf{x}) (1  \widehat{q}_j(\mathbf{x})) \to \infty \quad \text{as } d \to \infty \)
RankDice: AlgoBlindA
RankDice: AlgoBlindA
RankDice: AlgoBlindA
GPU via CUDA
RankDice: AlgoBlindA
RankDice: Algo
RankDice: Theory
Fisher consistency or ClassificationCalibration
(Lin, 2004, Zhang, 2004, Bartlett et al 2006)
Classification
Segmentation
RankDice: Theory
RankDice: Theory
RankDice: Theory
RankDice: Theory
RankDice: Experiments
 Three segmentation benchmark: VOC, CityScapes, Kvasir
RankDice: Experiments
 Three segmentation benchmark: VOC, CityScapes, Kvasir
Source: Visual Object Classes Challenge 2012 (VOC2012)
RankDice: Experiments
 Three segmentation benchmark: VOC, CityScapes, Kvasir
Source: Visual Object Classes Challenge 2012 (VOC2012)
Source: The Cityscapes Dataset: Semantic Understanding of Urban Street Scenes
RankDice: Experiments
 Three segmentation benchmarks: VOC, CityScapes, Kvasir
Source: Visual Object Classes Challenge 2012 (VOC2012)
Source: The Cityscapes Dataset: Semantic Understanding of Urban Street Scenes
Jha et al (2020) Kvasirseg: A segmented polyp dataset
RankDice: Experiments

Three segmentation benchmarks: VOC, CityScapes, Kvasir
 Standard benchmarks, NOT cherrypicks
 Three commonly used DL models: DeepLabV3+, PSPNet, FCN
RankDice: Experiments

Three segmentation benchmarks: VOC, CityScapes, Kvasir
 Standard benchmarks, NOT cherrypicks
 Three commonly used DL models: DeepLabV3+, PSPNet, FCN
RankDice: Experiments

Three segmentation benchmarks: VOC, CityScapes, Kvasir
 Standard benchmarks, NOT cherrypicks
 Three commonly used DL models: DeepLabV3+, PSPNet, FCN
DeepLab: Chen et al (2018) EncoderDecoder with Atrous Separable Convolution for Semantic Image Segmentation
RankDice: Experiments

Three segmentation benchmarks: VOC, CityScapes, Kvasir
 Standard benchmarks, NOT cherrypicks
 Three commonly used DL models: DeepLabV3+, PSPNet, FCN
DeepLab: Chen et al (2018) EncoderDecoder with Atrous Separable Convolution for Semantic Image Segmentation
PSPNet: Zhao et al (2017) Pyramid Scene Parsing Network
RankDice: Experiments

Three segmentation benchmarks: VOC, CityScapes, Kvasir
 Standard benchmarks, NOT cherrypicks
 Three commonly used DL models: DeepLabV3+, PSPNet, FCN
DeepLab: Chen et al (2018) EncoderDecoder with Atrous Separable Convolution for Semantic Image Segmentation
PSPNet: Zhao et al (2017) Pyramid Scene Parsing Network
FCN: Long, et al. (2015) Fully convolutional networks for semantic segmentation
RankDice: Experiments

Three segmentation benchmarks: VOC, CityScapes, Kvasir
 Standard benchmarks, NOT cherrypicks
 Three commonly used DL models: DeepLabV3+, PSPNet, FCN
 The proposed framework VS. the existing frameworks
 Based on the same trained neural networks
 No implementation tricks
 OpenSource python module and codes
 All trained neural networks available for free download
RankDice: Experiments

Three segmentation benchmarks: VOC, CityScapes, Kvasir
 Standard benchmarks, NOT cherrypicks
 Three commonly used DL models: DeepLabV3+, PSPNet, FCN
 The proposed framework VS. the existing framework
 Based on the same trained neural network
 No implementation tricks
 OpenSource code
 All trained neural networks available for free download
RankDice: Experiments
RankDice: Experiments
RankDice: Experiments
RankDice: Experiments
RankDice: Experiments
RankDice: Experiments
RankDice: Experiments
The optimal threshold is NOT 0.5, and it is adaptive over different images/inputs
RankDice: Experiments
The optimal threshold is NOT fixed, and it is adaptive over different images/inputs
mRankDice
Long, et al. (2015) Fully convolutional networks for semantic segmentation
mRankDice
 Probabilistic model: multiclass or multilabel
 Decision rule: overlapping / nonoverlapping
mRankDice
mRankDice
mRankDice
mRankDice
mRankDice
More Results...
 mRankDice: extension and challenge
 RankIoU
 Simulation
 Probability calibration
 ....
Contribution

To our best knowledge, the proposed rankingbased segmentation framework RankDice, is the first consistent segmentation framework with respect to the Dice metric.

Three numerical algorithms with GPU parallel execution are developed to implement the proposed framework in largescale and highdimensional segmentation.

We establish a theoretical foundation of segmentation with respect to the Dice metric, such as the Bayes rule, Dicecalibration, and a convergence rate of the excess risk for the proposed RankDice framework, and indicate inconsistent results for the existing methods.

Our experiments suggest that the improvement of RankDice over the existing frameworks is significant.
Contribution

To our best knowledge, the proposed rankingbased segmentation framework RankDice, is the first consistent segmentation framework with respect to the Dice metric.

Three numerical algorithms with GPU parallel execution are developed to implement the proposed framework in largescale and highdimensional segmentation.

We establish a theoretical foundation of segmentation with respect to the Dice metric, such as the Bayes rule, Dicecalibration, and a convergence rate of the excess risk for the proposed RankDice framework, and indicate inconsistent results for the existing methods.

Our experiments suggest that the improvement of RankDice over the existing frameworks is significant.
“There is Nothing More Practical Than A Good Theory.”
— Kurt Lewin
Thank you!
rankseg
By statmlben
rankseg
 175