RankSEG: A Consistent Ranking-based Framework for Segmentation
Ben Dai
The Chinese University of Hong Kong

From Classification to Segmentation
- Recall Classification problem
- Data: \( \mathbf{X} \in \mathbb{R}^d \to Y \in \{0,1\} \)
- Decision function: \( \delta(\mathbf{X}): \mathbb{R}^d \to \{0,1\} \)
- Evaluation:
$$ \text{Acc}( \delta) = \mathbb{E}( \mathbf{1}( Y = \delta(\mathbf{X}) )) $$
What is the "best" decision function? Bayes Rule!
$$ \delta^* = \argmax_{\delta} \ \text{Acc}(\delta) \ \to \ \delta^*(\mathbf{x}) = \mathbf{1}( p(\mathbf{x}) \geq 0.5 ) $$
Plug-in rule:
$$\widehat{\delta}(\mathbf{x}) = \mathbf{1}( q(\mathbf{x}) \geq 0.5 ), \quad q(\mathbf{x}) \text{ is an estimator of } p(\mathbf{x}) $$
$$ p(\mathbf{x}) = \mathbb{P}(Y=1|\mathbf{X}=\mathbf{x})$$
Segmentation

Long, et al. (2015) Fully convolutional networks for semantic segmentation
- Input: \(\mathbf{X} \in \mathbb{R}^d\)
- Outcome: \(\mathbf{Y} \in \{0,1\}^d\)
-
Segmentation function:
- \( \pmb{\delta}: \mathbb{R}^d \to \{0,1\}^d\)
- \( \pmb{\delta}(\mathbf{X}) = ( \delta_1(\mathbf{X}), \cdots, \delta_d(\mathbf{X}) )^\intercal \)
-
Predicted segmentation:
- \( I(\pmb{\delta}(\mathbf{X})) = \{j: \delta_j(\mathbf{X}) = 1 \}\)
Segmentation

Long, et al. (2015) Fully convolutional networks for semantic segmentation
- Input: \(\mathbf{X} \in \mathbb{R}^d\)
- Outcome: \(\mathbf{Y} \in \{0,1\}^d\)
-
Segmentation function:
- \( \pmb{\delta}: \mathbb{R}^d \to \{0,1\}^d\)
- \( \pmb{\delta}(\mathbf{X}) = ( \delta_1(\mathbf{X}), \cdots, \delta_d(\mathbf{X}) )^\intercal \)
-
Predicted segmentation:
- \( I(\pmb{\delta}(\mathbf{X})) = \{j: \delta_j(\mathbf{X}) = 1 \}\)
Segmentation

Long, et al. (2015) Fully convolutional networks for semantic segmentation
- Input: \(\mathbf{X} \in \mathbb{R}^d\)
- Outcome: \(\mathbf{Y} \in \{0,1\}^d\)
-
Segmentation function:
- \( \pmb{\delta}: \mathbb{R}^d \to \{0,1\}^d\)
- \( \pmb{\delta}(\mathbf{X}) = ( \delta_1(\mathbf{X}), \cdots, \delta_d(\mathbf{X}) )^\intercal \)
-
Predicted segmentation:
- \( I(\pmb{\delta}(\mathbf{X})) = \{j: \delta_j(\mathbf{X}) = 1 \}\)
Goal: learn segmentation decision function \( \pmb{\delta} \)
Evaluation
The Dice and IoU metrics are introduced and widely used in the literature:


Evaluation
The Dice and IoU metrics are introduced and widely used in the literature:


Evaluation
The Dice and IoU metrics are introduced and widely used in the literature:


Existing Framework
- Given training data \( \{\mathbf{x}_i, \mathbf{y}_i\} _{i=1, \cdots, n}\), most existing methods characterize segmentation as a classification problem:

Existing Framework
- Given training data \( \{\mathbf{x}_i, \mathbf{y}_i\} _{i=1, \cdots, n}\), most existing methods characterize segmentation as a classification problem:

Classification-based loss

Existing Framework
- Given training data \( \{\mathbf{x}_i, \mathbf{y}_i\} _{i=1, \cdots, n}\), most existing methods characterize segmentation as a classification problem:

Dice-approximating loss

Bayes Segmentation Rule
We discuss Dice-segmentation at the population level, and present its Bayes segmentation rule akin to the Bayes classifier.
$$ p_j(\mathbf{x}) := \mathbb{P}(Y_j = 1 | \mathbf{X} = \mathbf{x})$$
To begin with, we introduce some notations:
- Segmentation probability for the \(j\)-th pixel
- \({B}_j(\mathbf{x})\) is a Bernoulli random variable with the success probability \(p_{j}(\mathbf{x})\)
$$ \pmb{\delta}^* = \text{argmax}_{\pmb{\delta}} \ \text{Dice}_\gamma ( \pmb{\delta})$$

Bayes Segmentation Rule
We discuss Dice-segmentation at the population level, and present its Bayes segmentation rule akin to the Bayes classifier.
$$ p_j(\mathbf{x}) := \mathbb{P}(Y_j = 1 | \mathbf{X} = \mathbf{x})$$
To begin with, we introduce some notations:
- Segmentation probability for the \(j\)-th pixel
- \({B}_j(\mathbf{x})\) is a Bernoulli random variable with the success probability \(p_{j}(\mathbf{x})\)
$$ \pmb{\delta}^* = \text{argmax}_{\pmb{\delta}} \ \text{Dice}_\gamma ( \pmb{\delta})$$

B |
Bayes Segmentation Rule
Theorem 1 (Dai and Li, 2023). A segmentation rule \(\pmb{\delta}^*\) is a global maximizer of \(\text{Dice}_\gamma(\pmb{\delta})\) if and only if it satisfies that
\( \tau^*(\mathbf{x}) \) is called optimal segmentation volume, defined as
where \(J_\tau(\mathbf{x})\) is the index set of the \(\tau\)-largest probabilities, \(\Gamma(\mathbf{x}) = \sum_{j=1}^d {B}_{j}(\mathbf{x})\), and \( {\Gamma}_{- j}(\mathbf{x}) = \sum_{j' \neq j} {B}_{j'}(\mathbf{x})\) are Poisson-binomial random variables.
Bayes Segmentation Rule
Theorem 1 (Dai and Li, 2023). A segmentation rule \(\pmb{\delta}^*\) is a global maximizer of \(\text{Dice}_\gamma(\pmb{\delta})\) if and only if it satisfies that
\( \tau^*(\mathbf{x}) \) is called optimal segmentation volume, defined as
where \(J_\tau(\mathbf{x})\) is the index set of the \(\tau\)-largest probabilities, \(\Gamma(\mathbf{x}) = \sum_{j=1}^d {B}_{j}(\mathbf{x})\), and \( {\Gamma}_{- j}(\mathbf{x}) = \sum_{j' \neq j} {B}_{j'}(\mathbf{x})\) are Poisson-binomial random variables.
Bayes Segmentation Rule
The Dice measure is separable w.r.t. \(j\)

Bayes Segmentation Rule


Bayes Segmentation Rule



Bayes Segmentation Rule
Theorem 1 (Dai and Li, 2023). A segmentation rule \(\pmb{\delta}^*\) is a global maximizer of \(\text{Dice}_\gamma(\pmb{\delta})\) if and only if it satisfies that
\( \tau^*(\mathbf{x}) \) is called optimal segmentation volume, defined as
Obs: both the Bayes segmentation rule \(\pmb{\delta}^*(\mathbf{x})\) and the optimal volume function \(\tau^*(\mathbf{x})\) are achievable when the conditional probability \(\mathbf{p}(\mathbf{x}) = ( p_1(\mathbf{x}), \cdots, p_d(\mathbf{x}) )^\intercal\) is well-estimated
Bayes Segmentation Rule
Theorem 1 (Dai and Li, 2023). A segmentation rule \(\pmb{\delta}^*\) is a global maximizer of \(\text{Dice}_\gamma(\pmb{\delta})\) if and only if it satisfies that
\( \tau^*(\mathbf{x}) \) is called optimal segmentation volume, defined as
where \(J_\tau(\mathbf{x})\) is the index set of the \(\tau\)-largest probabilities, \(\Gamma(\mathbf{x}) = \sum_{j=1}^d {B}_{j}(\mathbf{x})\), and \( {\Gamma}_{- j}(\mathbf{x}) = \sum_{j' \neq j} {B}_{j'}(\mathbf{x})\) are Poisson-binomial random variables.
RankDice inspired by Thm 1 (plug-in rule)
-
Ranking the conditional probability \(p_j(\mathbf{x})\)
Bayes Segmentation Rule
Theorem 1 (Dai and Li, 2023+). A segmentation rule \(\pmb{\delta}^*\) is a global maximizer of \(\text{Dice}_\gamma(\pmb{\delta})\) if and only if it satisfies that
\( \tau^*(\mathbf{x}) \) is called optimal segmentation volume, defined as
RankDice inspired by Thm 1
-
Ranking the conditional probability \(p_j(\mathbf{x})\)
-
searching for the optimal volume of the segmented features \(\tau(\mathbf{x})\)
RankDice: Framework

RankDice: Framework


RankDice: Framework


RankDice: Framework


RankDice: Algo

- Fast evaluation of Poisson-binomial r.v.
- Quick search for \(\tau \in \{0,1,\cdots, d\}\)

Note that (6) can be rewritten as:
RankDice: Algo
In practice, the DFT–CF method is generally recommended for computing. The RF1 method can also been used when n < 1000, because there is not much difference in computing time from the DFT–CF method. The RNA method is recommended when n > 2000 and the cdf needs to be evaluated many times. As shown in the numerical study, the RNA method can approximate the cdf well, when n is large, and is more computationally efficient.
Hong. (2013) On computing the distribution function for the Poisson binomial distribution
RankDice: Algo
In practice, the DFT–CF method is generally recommended for computing. The RF1 method can also been used when n < 1000, because there is not much difference in computing time from the DFT–CF method. The RNA method is recommended when n > 2000 and the cdf needs to be evaluated many times. As shown in the numerical study, the RNA method can approximate the cdf well, when n is large, and is more computationally efficient.
Hong. (2013) On computing the distribution function for the Poisson binomial distribution
- Fast evaluation of Poisson-binomial r.v.
- Quick search for \(\tau \in \{0,1,\cdots, d\}\)
RankDice: Algo-Early Stop
Lemma 3 (Dai and Li, 2023). If \(\sum_{s=1}^{\tau} \widehat{q}_{j_s}(\mathbf{x}) \geq (\tau + \gamma + d) \widehat{q}_{j_{\tau+1}}(\mathbf{x})\), then \(\bar{\pi}_\tau(\mathbf{x}) \geq \bar{\pi}_{\tau'}(\mathbf{x})\) for all \(\tau' >\tau\)

Early stop!
RankDice: Algo-TRNA

RankDice: Algo-TRNA


RankDice: Algo-TRNA

It is unnecessary to compute all \(\mathbb{P}(\widehat{\Gamma}_{-j}(\mathbf{x}) = l)\) and \(\mathbb{P}(\widehat{\Gamma}(\mathbf{x}) = l)\) for \(l=1, \cdots, d\), since they are negligibly close to zero when \(l\) is too small or too large.
RankDice: Algo-TRNA


Truncation!
RankDice: Algo-TRNA


\(\widehat{\sigma}^2(\mathbf{x}) = \sum_{j=1}^d \widehat{q}_j(\mathbf{x}) (1 - \widehat{q}_j(\mathbf{x})) \to \infty \quad \text{as } d \to \infty \)
RankDice: Algo-BlindA

RankDice: Algo-BlindA


RankDice: Algo-BlindA


GPU via CUDA
RankDice: Algo-BlindA

RankDice: Algo

RankDice: Theory

Fisher consistency or Classification-Calibration
(Lin, 2004, Zhang, 2004, Bartlett et al 2006)
Classification
Segmentation
RankDice: Theory


RankDice: Theory


RankDice: Theory


RankDice: Theory


RankDice: Experiments
- Three segmentation benchmark: VOC, CityScapes, Kvasir
RankDice: Experiments
- Three segmentation benchmark: VOC, CityScapes, Kvasir

Source: Visual Object Classes Challenge 2012 (VOC2012)
RankDice: Experiments
- Three segmentation benchmark: VOC, CityScapes, Kvasir

Source: Visual Object Classes Challenge 2012 (VOC2012)

Source: The Cityscapes Dataset: Semantic Understanding of Urban Street Scenes
RankDice: Experiments
- Three segmentation benchmarks: VOC, CityScapes, Kvasir

Source: Visual Object Classes Challenge 2012 (VOC2012)

Source: The Cityscapes Dataset: Semantic Understanding of Urban Street Scenes


Jha et al (2020) Kvasir-seg: A segmented polyp dataset
RankDice: Experiments
-
Three segmentation benchmarks: VOC, CityScapes, Kvasir
- Standard benchmarks, NOT cherry-picks
- Three commonly used DL models: DeepLab-V3+, PSPNet, FCN
RankDice: Experiments
-
Three segmentation benchmarks: VOC, CityScapes, Kvasir
- Standard benchmarks, NOT cherry-picks
- Three commonly used DL models: DeepLab-V3+, PSPNet, FCN

RankDice: Experiments
-
Three segmentation benchmarks: VOC, CityScapes, Kvasir
- Standard benchmarks, NOT cherry-picks
- Three commonly used DL models: DeepLab-V3+, PSPNet, FCN

DeepLab: Chen et al (2018) Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
RankDice: Experiments
-
Three segmentation benchmarks: VOC, CityScapes, Kvasir
- Standard benchmarks, NOT cherry-picks
- Three commonly used DL models: DeepLab-V3+, PSPNet, FCN

DeepLab: Chen et al (2018) Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

PSPNet: Zhao et al (2017) Pyramid Scene Parsing Network
RankDice: Experiments
-
Three segmentation benchmarks: VOC, CityScapes, Kvasir
- Standard benchmarks, NOT cherry-picks
- Three commonly used DL models: DeepLab-V3+, PSPNet, FCN

DeepLab: Chen et al (2018) Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

PSPNet: Zhao et al (2017) Pyramid Scene Parsing Network

FCN: Long, et al. (2015) Fully convolutional networks for semantic segmentation
RankDice: Experiments
-
Three segmentation benchmarks: VOC, CityScapes, Kvasir
- Standard benchmarks, NOT cherry-picks
- Three commonly used DL models: DeepLab-V3+, PSPNet, FCN
- The proposed framework VS. the existing frameworks
- Based on the same trained neural networks
- No implementation tricks
- Open-Source python module and codes
- All trained neural networks available for free download
RankDice: Experiments
-
Three segmentation benchmarks: VOC, CityScapes, Kvasir
- Standard benchmarks, NOT cherry-picks
- Three commonly used DL models: DeepLab-V3+, PSPNet, FCN
- The proposed framework VS. the existing framework
- Based on the same trained neural network
- No implementation tricks
- Open-Source code
- All trained neural networks available for free download

RankDice: Experiments

RankDice: Experiments


RankDice: Experiments



RankDice: Experiments




RankDice: Experiments

RankDice: Experiments


RankDice: Experiments


The optimal threshold is NOT 0.5, and it is adaptive over different images/inputs
RankDice: Experiments


The optimal threshold is NOT fixed, and it is adaptive over different images/inputs

mRankDice


Long, et al. (2015) Fully convolutional networks for semantic segmentation
mRankDice

- Probabilistic model: multiclass or multilabel
- Decision rule: overlapping / non-overlapping
mRankDice


mRankDice

mRankDice

mRankDice


mRankDice


More Results...
- mRankDice: extension and challenge
- RankIoU
- Simulation
- Probability calibration
- ....
Contribution
-
To our best knowledge, the proposed ranking-based segmentation framework RankDice, is the first consistent segmentation framework with respect to the Dice metric.
-
Three numerical algorithms with GPU parallel execution are developed to implement the proposed framework in large-scale and high-dimensional segmentation.
-
We establish a theoretical foundation of segmentation with respect to the Dice metric, such as the Bayes rule, Dice-calibration, and a convergence rate of the excess risk for the proposed RankDice framework, and indicate inconsistent results for the existing methods.
-
Our experiments suggest that the improvement of RankDice over the existing frameworks is significant.
Contribution
-
To our best knowledge, the proposed ranking-based segmentation framework RankDice, is the first consistent segmentation framework with respect to the Dice metric.
-
Three numerical algorithms with GPU parallel execution are developed to implement the proposed framework in large-scale and high-dimensional segmentation.
-
We establish a theoretical foundation of segmentation with respect to the Dice metric, such as the Bayes rule, Dice-calibration, and a convergence rate of the excess risk for the proposed RankDice framework, and indicate inconsistent results for the existing methods.
-
Our experiments suggest that the improvement of RankDice over the existing frameworks is significant.
“There is Nothing More Practical Than A Good Theory.”
— Kurt Lewin




Thank you!


rankseg
By statmlben
rankseg
- 175