rankseg

RankSEG: A Consistent Ranking-based Framework for Segmentation

Ben Dai

The Chinese University of Hong Kong

From Classification to Segmentation

Recall Classification problem
Data: $ \mathbf{X} \in \mathbb{R}^d \to Y \in \{0,1\} $
Decision function: $ \delta(\mathbf{X}): \mathbb{R}^d \to \{0,1\} $
Evaluation:

$$ \text{Acc}( \delta) = \mathbb{E}( \mathbf{1}( Y = \delta(\mathbf{X}) )) $$

What is the "best" decision function? Bayes Rule!

$$ \delta^* = \argmax_{\delta} \ \text{Acc}(\delta) \ \to \ \delta^*(\mathbf{x}) = \mathbf{1}( p(\mathbf{x}) \geq 0.5 ) $$

Plug-in rule:

$$\widehat{\delta}(\mathbf{x}) = \mathbf{1}( q(\mathbf{x}) \geq 0.5 ), \quad q(\mathbf{x}) \text{ is an estimator of } p(\mathbf{x}) $$

$$ p(\mathbf{x}) = \mathbb{P}(Y=1|\mathbf{X}=\mathbf{x})$$

Segmentation

Long, et al. (2015) Fully convolutional networks for semantic segmentation

Input: $\mathbf{X} \in \mathbb{R}^d$
Outcome: $\mathbf{Y} \in \{0,1\}^d$
Segmentation function:
- $ \pmb{\delta}: \mathbb{R}^d \to \{0,1\}^d$
- $ \pmb{\delta}(\mathbf{X}) = ( \delta_1(\mathbf{X}), \cdots, \delta_d(\mathbf{X}) )^\intercal $
Predicted segmentation:
- $ I(\pmb{\delta}(\mathbf{X})) = \{j: \delta_j(\mathbf{X}) = 1 \}$

Segmentation

Long, et al. (2015) Fully convolutional networks for semantic segmentation

Input: $\mathbf{X} \in \mathbb{R}^d$
Outcome: $\mathbf{Y} \in \{0,1\}^d$
Segmentation function:
- $ \pmb{\delta}: \mathbb{R}^d \to \{0,1\}^d$
- $ \pmb{\delta}(\mathbf{X}) = ( \delta_1(\mathbf{X}), \cdots, \delta_d(\mathbf{X}) )^\intercal $
Predicted segmentation:
- $ I(\pmb{\delta}(\mathbf{X})) = \{j: \delta_j(\mathbf{X}) = 1 \}$

Segmentation

Long, et al. (2015) Fully convolutional networks for semantic segmentation

Input: $\mathbf{X} \in \mathbb{R}^d$
Outcome: $\mathbf{Y} \in \{0,1\}^d$
Segmentation function:
- $ \pmb{\delta}: \mathbb{R}^d \to \{0,1\}^d$
- $ \pmb{\delta}(\mathbf{X}) = ( \delta_1(\mathbf{X}), \cdots, \delta_d(\mathbf{X}) )^\intercal $
Predicted segmentation:
- $ I(\pmb{\delta}(\mathbf{X})) = \{j: \delta_j(\mathbf{X}) = 1 \}$

Goal: learn segmentation decision function $ \pmb{\delta} $

Evaluation

The Dice and IoU metrics are introduced and widely used in the literature:

Evaluation

The Dice and IoU metrics are introduced and widely used in the literature:

Evaluation

The Dice and IoU metrics are introduced and widely used in the literature:

Existing Framework

Given training data $ \{\mathbf{x}_i, \mathbf{y}_i\} _{i=1, \cdots, n}$, most existing methods characterize segmentation as a classification problem:

Existing Framework

Given training data $ \{\mathbf{x}_i, \mathbf{y}_i\} _{i=1, \cdots, n}$, most existing methods characterize segmentation as a classification problem:

Classification-based loss

Existing Framework

Given training data $ \{\mathbf{x}_i, \mathbf{y}_i\} _{i=1, \cdots, n}$, most existing methods characterize segmentation as a classification problem:

Dice-approximating loss

Bayes Segmentation Rule

We discuss Dice-segmentation at the population level, and present its Bayes segmentation rule akin to the Bayes classifier.

$$ p_j(\mathbf{x}) := \mathbb{P}(Y_j = 1 | \mathbf{X} = \mathbf{x})$$

To begin with, we introduce some notations:

Segmentation probability for the $j$-th pixel

${B}_j(\mathbf{x})$ is a Bernoulli random variable with the success probability $p_{j}(\mathbf{x})$

$$ \pmb{\delta}^* = \text{argmax}_{\pmb{\delta}} \ \text{Dice}_\gamma ( \pmb{\delta})$$

Bayes Segmentation Rule

We discuss Dice-segmentation at the population level, and present its Bayes segmentation rule akin to the Bayes classifier.

$$ p_j(\mathbf{x}) := \mathbb{P}(Y_j = 1 | \mathbf{X} = \mathbf{x})$$

To begin with, we introduce some notations:

Segmentation probability for the $j$-th pixel

${B}_j(\mathbf{x})$ is a Bernoulli random variable with the success probability $p_{j}(\mathbf{x})$

$$ \pmb{\delta}^* = \text{argmax}_{\pmb{\delta}} \ \text{Dice}_\gamma ( \pmb{\delta})$$




	B

Bayes Segmentation Rule

Theorem 1 (Dai and Li, 2023). A segmentation rule $\pmb{\delta}^*$ is a global maximizer of $\text{Dice}_\gamma(\pmb{\delta})$ if and only if it satisfies that

$ \tau^*(\mathbf{x}) $ is called optimal segmentation volume, defined as

where $J_\tau(\mathbf{x})$ is the index set of the $\tau$-largest probabilities, $\Gamma(\mathbf{x}) = \sum_{j=1}^d {B}_{j}(\mathbf{x})$, and $ {\Gamma}_{- j}(\mathbf{x}) = \sum_{j' \neq j} {B}_{j'}(\mathbf{x})$ are Poisson-binomial random variables.

Bayes Segmentation Rule

Theorem 1 (Dai and Li, 2023). A segmentation rule $\pmb{\delta}^*$ is a global maximizer of $\text{Dice}_\gamma(\pmb{\delta})$ if and only if it satisfies that

$ \tau^*(\mathbf{x}) $ is called optimal segmentation volume, defined as

Bayes Segmentation Rule

The Dice measure is separable w.r.t. $j$

Bayes Segmentation Rule

Theorem 1 (Dai and Li, 2023). A segmentation rule $\pmb{\delta}^*$ is a global maximizer of $\text{Dice}_\gamma(\pmb{\delta})$ if and only if it satisfies that

$ \tau^*(\mathbf{x}) $ is called optimal segmentation volume, defined as

Obs: both the Bayes segmentation rule $\pmb{\delta}^*(\mathbf{x})$ and the optimal volume function $\tau^*(\mathbf{x})$ are achievable when the conditional probability $\mathbf{p}(\mathbf{x}) = ( p_1(\mathbf{x}), \cdots, p_d(\mathbf{x}) )^\intercal$ is well-estimated

Bayes Segmentation Rule

Theorem 1 (Dai and Li, 2023). A segmentation rule $\pmb{\delta}^*$ is a global maximizer of $\text{Dice}_\gamma(\pmb{\delta})$ if and only if it satisfies that

$ \tau^*(\mathbf{x}) $ is called optimal segmentation volume, defined as

RankDice inspired by Thm 1 (plug-in rule)

Ranking the conditional probability $p_j(\mathbf{x})$

Bayes Segmentation Rule

Theorem 1 (Dai and Li, 2023+). A segmentation rule $\pmb{\delta}^*$ is a global maximizer of $\text{Dice}_\gamma(\pmb{\delta})$ if and only if it satisfies that

$ \tau^*(\mathbf{x}) $ is called optimal segmentation volume, defined as

RankDice inspired by Thm 1

Ranking the conditional probability $p_j(\mathbf{x})$
searching for the optimal volume of the segmented features $\tau(\mathbf{x})$

RankDice: Framework

RankDice: Algo

Fast evaluation of Poisson-binomial r.v.
Quick search for $\tau \in \{0,1,\cdots, d\}$

Note that (6) can be rewritten as:

RankDice: Algo

In practice, the DFT–CF method is generally recommended for computing. The RF1 method can also been used when n < 1000, because there is not much difference in computing time from the DFT–CF method. The RNA method is recommended when n > 2000 and the cdf needs to be evaluated many times. As shown in the numerical study, the RNA method can approximate the cdf well, when n is large, and is more computationally efficient.

Hong. (2013) On computing the distribution function for the Poisson binomial distribution

RankDice: Algo

In practice, the DFT–CF method is generally recommended for computing. The RF1 method can also been used when n < 1000, because there is not much difference in computing time from the DFT–CF method. The RNA method is recommended when n > 2000 and the cdf needs to be evaluated many times. As shown in the numerical study, the RNA method can approximate the cdf well, when n is large, and is more computationally efficient.

Hong. (2013) On computing the distribution function for the Poisson binomial distribution

Fast evaluation of Poisson-binomial r.v.
Quick search for $\tau \in \{0,1,\cdots, d\}$

RankDice: Algo-Early Stop

Lemma 3 (Dai and Li, 2023). If $\sum_{s=1}^{\tau} \widehat{q}_{j_s}(\mathbf{x}) \geq (\tau + \gamma + d) \widehat{q}_{j_{\tau+1}}(\mathbf{x})$, then $\bar{\pi}_\tau(\mathbf{x}) \geq \bar{\pi}_{\tau'}(\mathbf{x})$ for all $\tau' >\tau$

Early stop!

RankDice: Algo-TRNA

It is unnecessary to compute all $\mathbb{P}(\widehat{\Gamma}_{-j}(\mathbf{x}) = l)$ and $\mathbb{P}(\widehat{\Gamma}(\mathbf{x}) = l)$ for $l=1, \cdots, d$, since they are negligibly close to zero when $l$ is too small or too large.

RankDice: Algo-TRNA

Truncation!

RankDice: Algo-TRNA

$\widehat{\sigma}^2(\mathbf{x}) = \sum_{j=1}^d \widehat{q}_j(\mathbf{x}) (1 - \widehat{q}_j(\mathbf{x})) \to \infty \quad \text{as } d \to \infty $

RankDice: Algo-BlindA

GPU via CUDA

RankDice: Algo-BlindA

RankDice: Algo

RankDice: Theory

Fisher consistency or Classification-Calibration

(Lin, 2004, Zhang, 2004, Bartlett et al 2006)

Classification

Segmentation

RankDice: Theory

RankDice: Experiments

Three segmentation benchmark: VOC, CityScapes, Kvasir

RankDice: Experiments

Three segmentation benchmark: VOC, CityScapes, Kvasir

Source: Visual Object Classes Challenge 2012 (VOC2012)

RankDice: Experiments

Three segmentation benchmark: VOC, CityScapes, Kvasir

Source: Visual Object Classes Challenge 2012 (VOC2012)

Source: The Cityscapes Dataset: Semantic Understanding of Urban Street Scenes

RankDice: Experiments

Three segmentation benchmarks: VOC, CityScapes, Kvasir

Source: Visual Object Classes Challenge 2012 (VOC2012)

Source: The Cityscapes Dataset: Semantic Understanding of Urban Street Scenes

Jha et al (2020) Kvasir-seg: A segmented polyp dataset

RankDice: Experiments

Three segmentation benchmarks: VOC, CityScapes, Kvasir
- Standard benchmarks, NOT cherry-picks
Three commonly used DL models: DeepLab-V3+, PSPNet, FCN

RankDice: Experiments

Three segmentation benchmarks: VOC, CityScapes, Kvasir
- Standard benchmarks, NOT cherry-picks
Three commonly used DL models: DeepLab-V3+, PSPNet, FCN

RankDice: Experiments

Three segmentation benchmarks: VOC, CityScapes, Kvasir
- Standard benchmarks, NOT cherry-picks
Three commonly used DL models: DeepLab-V3+, PSPNet, FCN

DeepLab: Chen et al (2018) Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

RankDice: Experiments

Three segmentation benchmarks: VOC, CityScapes, Kvasir
- Standard benchmarks, NOT cherry-picks
Three commonly used DL models: DeepLab-V3+, PSPNet, FCN

DeepLab: Chen et al (2018) Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

PSPNet: Zhao et al (2017) Pyramid Scene Parsing Network

RankDice: Experiments

Three segmentation benchmarks: VOC, CityScapes, Kvasir
- Standard benchmarks, NOT cherry-picks
Three commonly used DL models: DeepLab-V3+, PSPNet, FCN

DeepLab: Chen et al (2018) Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

PSPNet: Zhao et al (2017) Pyramid Scene Parsing Network

FCN: Long, et al. (2015) Fully convolutional networks for semantic segmentation

RankDice: Experiments

Three segmentation benchmarks: VOC, CityScapes, Kvasir
- Standard benchmarks, NOT cherry-picks
Three commonly used DL models: DeepLab-V3+, PSPNet, FCN
The proposed framework VS. the existing frameworks
- Based on the same trained neural networks
- No implementation tricks
- Open-Source python module and codes
- All trained neural networks available for free download

RankDice: Experiments

Three segmentation benchmarks: VOC, CityScapes, Kvasir
- Standard benchmarks, NOT cherry-picks
Three commonly used DL models: DeepLab-V3+, PSPNet, FCN
The proposed framework VS. the existing framework
- Based on the same trained neural network
- No implementation tricks
- Open-Source code
- All trained neural networks available for free download

RankDice: Experiments

The optimal threshold is NOT 0.5, and it is adaptive over different images/inputs

RankDice: Experiments

The optimal threshold is NOT fixed, and it is adaptive over different images/inputs

mRankDice

Long, et al. (2015) Fully convolutional networks for semantic segmentation

mRankDice

Probabilistic model: multiclass or multilabel
Decision rule: overlapping / non-overlapping

mRankDice

More Results...

mRankDice: extension and challenge
RankIoU
Simulation
Probability calibration
....

Contribution

To our best knowledge, the proposed ranking-based segmentation framework RankDice, is the first consistent segmentation framework with respect to the Dice metric.
Three numerical algorithms with GPU parallel execution are developed to implement the proposed framework in large-scale and high-dimensional segmentation.
We establish a theoretical foundation of segmentation with respect to the Dice metric, such as the Bayes rule, Dice-calibration, and a convergence rate of the excess risk for the proposed RankDice framework, and indicate inconsistent results for the existing methods.
Our experiments suggest that the improvement of RankDice over the existing frameworks is significant.

Contribution

To our best knowledge, the proposed ranking-based segmentation framework RankDice, is the first consistent segmentation framework with respect to the Dice metric.
Three numerical algorithms with GPU parallel execution are developed to implement the proposed framework in large-scale and high-dimensional segmentation.
We establish a theoretical foundation of segmentation with respect to the Dice metric, such as the Bayes rule, Dice-calibration, and a convergence rate of the excess risk for the proposed RankDice framework, and indicate inconsistent results for the existing methods.
Our experiments suggest that the improvement of RankDice over the existing frameworks is significant.

“There is Nothing More Practical Than A Good Theory.”

— Kurt Lewin

RankSEG: A Consistent Ranking-based Framework for Segmentation

Ben Dai

The Chinese University of Hong Kong

From Classification to Segmentation

Segmentation

Segmentation

Segmentation

Evaluation

Evaluation

Evaluation

Existing Framework

Existing Framework

Existing Framework

Bayes Segmentation Rule

Bayes Segmentation Rule

Bayes Segmentation Rule

Bayes Segmentation Rule

Bayes Segmentation Rule

Bayes Segmentation Rule

Bayes Segmentation Rule

Bayes Segmentation Rule

Bayes Segmentation Rule

Bayes Segmentation Rule

RankDice: Framework

RankDice: Framework

RankDice: Framework

RankDice: Framework

RankDice: Algo

RankDice: Algo

RankDice: Algo

RankDice: Algo-Early Stop

RankDice: Algo-TRNA

RankDice: Algo-TRNA

RankDice: Algo-TRNA

RankDice: Algo-TRNA

RankDice: Algo-TRNA

RankDice: Algo-BlindA

RankDice: Algo-BlindA

RankDice: Algo-BlindA

RankDice: Algo-BlindA

RankDice: Algo

RankDice: Theory

RankDice: Theory

RankDice: Theory

RankDice: Theory

RankDice: Theory

RankDice: Experiments

RankDice: Experiments

RankDice: Experiments

RankDice: Experiments

RankDice: Experiments

RankDice: Experiments

RankDice: Experiments

RankDice: Experiments

RankDice: Experiments

RankDice: Experiments

RankDice: Experiments

RankDice: Experiments

RankDice: Experiments

RankDice: Experiments

RankDice: Experiments

RankDice: Experiments

RankDice: Experiments

RankDice: Experiments

RankDice: Experiments

mRankDice

mRankDice

mRankDice

mRankDice

mRankDice

mRankDice

mRankDice

More Results...

Contribution

Contribution

Thank you!