RankSEG: A Consistent Ranking-based Framework for Segmentation

(Joint with Chunlin Li)

Ben Dai

The Chinese University of Hong Kong

 

 

From Classification to Segmentation

  • Recall Classification problem
  • Data: \( \mathbf{X} \in \mathbb{R}^d \to Y \in \{0,1\} \)
  • Decision function: \( \delta(\mathbf{X}): \mathbb{R}^d \to \{0,1\} \)
  • Evaluation:

$$ \text{Acc}( \delta) = \mathbb{E}( \mathbf{1}( Y = \delta(\mathbf{X}) )) $$

What is the "best" decision function? Bayes Rule!

$$ \delta^* = \argmax_{\delta} \ \text{Acc}(\delta) \ \to \ \delta^*(\mathbf{x}) = \mathbf{1}( p(\mathbf{x}) \geq 0.5 ) $$

Plug-in rule:

$$\widehat{\delta}(\mathbf{x}) = \mathbf{1}( q(\mathbf{x}) \geq 0.5 ), \quad q(\mathbf{x}) \text{ is an estimator of } p(\mathbf{x}) $$

$$ p(\mathbf{x}) = \mathbb{P}(Y=1|\mathbf{X}=\mathbf{x})$$

Segmentation

Long, et al. (2015) Fully convolutional networks for semantic segmentation

  • Input: \(\mathbf{X} \in \mathbb{R}^d\)
  • Outcome: \(\mathbf{Y} \in \{0,1\}^d\)
  • Segmentation function:
    • \( \pmb{\delta}: \mathbb{R}^d \to \{0,1\}^d\)
    • \( \pmb{\delta}(\mathbf{X}) = ( \delta_1(\mathbf{X}), \cdots, \delta_d(\mathbf{X}) )^\intercal \)
  • Predicted segmentation:
    • \( I(\pmb{\delta}(\mathbf{X})) = \{j: \delta_j(\mathbf{X}) = 1 \}\)

Segmentation

Long, et al. (2015) Fully convolutional networks for semantic segmentation

  • Input: \(\mathbf{X} \in \mathbb{R}^d\)
  • Outcome: \(\mathbf{Y} \in \{0,1\}^d\)
  • Segmentation function:
    • \( \pmb{\delta}: \mathbb{R}^d \to \{0,1\}^d\)
    • \( \pmb{\delta}(\mathbf{X}) = ( \delta_1(\mathbf{X}), \cdots, \delta_d(\mathbf{X}) )^\intercal \)
  • Predicted segmentation:
    • \( I(\pmb{\delta}(\mathbf{X})) = \{j: \delta_j(\mathbf{X}) = 1 \}\)

Segmentation

Long, et al. (2015) Fully convolutional networks for semantic segmentation

  • Input: \(\mathbf{X} \in \mathbb{R}^d\)
  • Outcome: \(\mathbf{Y} \in \{0,1\}^d\)
  • Segmentation function:
    • \( \pmb{\delta}: \mathbb{R}^d \to \{0,1\}^d\)
    • \( \pmb{\delta}(\mathbf{X}) = ( \delta_1(\mathbf{X}), \cdots, \delta_d(\mathbf{X}) )^\intercal \)
  • Predicted segmentation:
    • \( I(\pmb{\delta}(\mathbf{X})) = \{j: \delta_j(\mathbf{X}) = 1 \}\)

Goal: learn segmentation decision function \( \pmb{\delta} \)

Evaluation

The Dice and IoU metrics are introduced and widely used in the literature:

Evaluation

The Dice and IoU metrics are introduced and widely used in the literature:

Existing Framework

  • Given training data \( \{\mathbf{x}_i, \mathbf{y}_i\} _{i=1, \cdots, n}\), most existing methods characterize segmentation as a classification problem:

Existing Framework

  • Given training data \( \{\mathbf{x}_i, \mathbf{y}_i\} _{i=1, \cdots, n}\), most existing methods characterize segmentation as a classification problem:

Classification-based loss

Existing Framework

  • Given training data \( \{\mathbf{x}_i, \mathbf{y}_i\} _{i=1, \cdots, n}\), most existing methods characterize segmentation as a classification problem:

Dice-approximating loss

Bayes Segmentation Rule

We discuss Dice-segmentation at the population level, and present its Bayes segmentation rule akin to the Bayes classifier.

$$ p_j(\mathbf{x}) :=  \mathbb{P}(Y_j = 1 | \mathbf{X} = \mathbf{x})$$

To begin with, we introduce some notations:

  • Segmentation probability for the \(j\)-th pixel
  • \({B}_j(\mathbf{x})\) is a Bernoulli random variable with the success probability \(p_{j}(\mathbf{x})\)

$$ \pmb{\delta}^* = \text{argmax}_{\pmb{\delta}} \ \text{Dice}_\gamma ( \pmb{\delta})$$

Bayes Segmentation Rule

We discuss Dice-segmentation at the population level, and present its Bayes segmentation rule akin to the Bayes classifier.

$$ p_j(\mathbf{x}) :=  \mathbb{P}(Y_j = 1 | \mathbf{X} = \mathbf{x})$$

To begin with, we introduce some notations:

  • Segmentation probability for the \(j\)-th pixel
  • \({B}_j(\mathbf{x})\) is a Bernoulli random variable with the success probability \(p_{j}(\mathbf{x})\)

$$ \pmb{\delta}^* = \text{argmax}_{\pmb{\delta}} \ \text{Dice}_\gamma ( \pmb{\delta})$$

  B

Bayes Segmentation Rule

Theorem 1 (Dai and Li, 2023). A segmentation rule \(\pmb{\delta}^*\) is a global maximizer of \(\text{Dice}_\gamma(\pmb{\delta})\) if and only if it satisfies that

\( \tau^*(\mathbf{x}) \) is called optimal segmentation volume, defined as

where \(J_\tau(\mathbf{x})\) is the index set of the \(\tau\)-largest probabilities, \(\Gamma(\mathbf{x}) = \sum_{j=1}^d {B}_{j}(\mathbf{x})\), and \( {\Gamma}_{- j}(\mathbf{x}) = \sum_{j' \neq j} {B}_{j'}(\mathbf{x})\) are Poisson-binomial random variables.

Bayes Segmentation Rule

Theorem 1 (Dai and Li, 2023). A segmentation rule \(\pmb{\delta}^*\) is a global maximizer of \(\text{Dice}_\gamma(\pmb{\delta})\) if and only if it satisfies that

\( \tau^*(\mathbf{x}) \) is called optimal segmentation volume, defined as

where \(J_\tau(\mathbf{x})\) is the index set of the \(\tau\)-largest probabilities, \(\Gamma(\mathbf{x}) = \sum_{j=1}^d {B}_{j}(\mathbf{x})\), and \( {\Gamma}_{- j}(\mathbf{x}) = \sum_{j' \neq j} {B}_{j'}(\mathbf{x})\) are Poisson-binomial random variables.

Bayes Segmentation Rule

The Dice measure is separable w.r.t. \(j\)

Bayes Segmentation Rule

Bayes Segmentation Rule

Bayes Segmentation Rule

Theorem 1 (Dai and Li, 2023). A segmentation rule \(\pmb{\delta}^*\) is a global maximizer of \(\text{Dice}_\gamma(\pmb{\delta})\) if and only if it satisfies that

\( \tau^*(\mathbf{x}) \) is called optimal segmentation volume, defined as

Obs: both the Bayes segmentation rule \(\pmb{\delta}^*(\mathbf{x})\) and the optimal volume function \(\tau^*(\mathbf{x})\) are achievable when the conditional probability \(\mathbf{p}(\mathbf{x}) = ( p_1(\mathbf{x}), \cdots, p_d(\mathbf{x}) )^\intercal\) is well-estimated

Bayes Segmentation Rule

Theorem 1 (Dai and Li, 2023). A segmentation rule \(\pmb{\delta}^*\) is a global maximizer of \(\text{Dice}_\gamma(\pmb{\delta})\) if and only if it satisfies that

\( \tau^*(\mathbf{x}) \) is called optimal segmentation volume, defined as

where \(J_\tau(\mathbf{x})\) is the index set of the \(\tau\)-largest probabilities, \(\Gamma(\mathbf{x}) = \sum_{j=1}^d {B}_{j}(\mathbf{x})\), and \( {\Gamma}_{- j}(\mathbf{x}) = \sum_{j' \neq j} {B}_{j'}(\mathbf{x})\) are Poisson-binomial random variables.

RankDice inspired by Thm 1 (plug-in rule)

  1. Ranking the conditional probability \(p_j(\mathbf{x})\)

Bayes Segmentation Rule

Theorem 1 (Dai and Li, 2023+). A segmentation rule \(\pmb{\delta}^*\) is a global maximizer of \(\text{Dice}_\gamma(\pmb{\delta})\) if and only if it satisfies that

\( \tau^*(\mathbf{x}) \) is called optimal segmentation volume, defined as

RankDice inspired by Thm 1

  1. Ranking the conditional probability \(p_j(\mathbf{x})\)

  2. searching for the optimal volume of the segmented features \(\tau(\mathbf{x})\)

RankDice: Framework

RankDice: Framework

RankDice: Framework

RankDice: Framework

RankDice: Algo

  1. Fast evaluation of Poisson-binomial r.v.
  2. Quick search for \(\tau \in \{0,1,\cdots, d\}\)

Note that (6) can be rewritten as:

RankDice: Algo

In practice, the DFT–CF method is generally recommended for computing. The RF1 method can also been used when n < 1000, because there is not much difference in computing time from the DFT–CF method. The RNA method is recommended when n > 2000 and the cdf needs to be evaluated many times. As shown in the numerical study, the RNA method can approximate the cdf well, when n is large, and is more computationally efficient.

Hong. (2013) On computing the distribution function for the Poisson binomial distribution

RankDice: Algo

In practice, the DFT–CF method is generally recommended for computing. The RF1 method can also been used when n < 1000, because there is not much difference in computing time from the DFT–CF method. The RNA method is recommended when n > 2000 and the cdf needs to be evaluated many times. As shown in the numerical study, the RNA method can approximate the cdf well, when n is large, and is more computationally efficient.

Hong. (2013) On computing the distribution function for the Poisson binomial distribution

  1. Fast evaluation of Poisson-binomial r.v.
  2. Quick search for \(\tau \in \{0,1,\cdots, d\}\)

RankDice: Algo-Early Stop

Lemma 3 (Dai and Li, 2023). If \(\sum_{s=1}^{\tau} \widehat{q}_{j_s}(\mathbf{x}) \geq (\tau + \gamma + d) \widehat{q}_{j_{\tau+1}}(\mathbf{x})\), then \(\bar{\pi}_\tau(\mathbf{x}) \geq \bar{\pi}_{\tau'}(\mathbf{x})\) for all \(\tau' >\tau\)

Early stop!

RankDice: Algo-TRNA

RankDice: Algo-TRNA

RankDice: Algo-TRNA

It is unnecessary to compute all \(\mathbb{P}(\widehat{\Gamma}_{-j}(\mathbf{x}) = l)\) and \(\mathbb{P}(\widehat{\Gamma}(\mathbf{x}) = l)\) for \(l=1, \cdots, d\), since they are negligibly close to zero when \(l\) is too small or too large.

RankDice: Algo-TRNA

Truncation!

RankDice: Algo-TRNA

\(\widehat{\sigma}^2(\mathbf{x}) = \sum_{j=1}^d \widehat{q}_j(\mathbf{x}) (1 - \widehat{q}_j(\mathbf{x})) \to \infty \quad \text{as } d \to \infty \)

\(o_P(1)\)

RankDice: Algo-BlindA

RankDice: Algo-BlindA

RankDice: Algo-BlindA

GPU via CUDA

RankDice: Algo-BlindA

\(o_P(1)\)

RankDice: Algo

RankDice: Theory

Fisher consistency or Classification-Calibration

(Lin, 2004, Zhang, 2004, Bartlett et al 2006)

Classification

Segmentation

RankDice: Theory

RankDice: Theory

RankDice: Theory

RankDice: Theory

RankDice: Experiments

  • Three segmentation benchmark: VOC, CityScapes, Kvasir

RankDice: Experiments

  • Three segmentation benchmark: VOC, CityScapes, Kvasir

Source: Visual Object Classes Challenge 2012 (VOC2012)

RankDice: Experiments

  • Three segmentation benchmark: VOC, CityScapes, Kvasir

Source: Visual Object Classes Challenge 2012 (VOC2012)

Source: The Cityscapes Dataset: Semantic Understanding of Urban Street Scenes

RankDice: Experiments

  • Three segmentation benchmarks: VOC, CityScapes, Kvasir

Source: Visual Object Classes Challenge 2012 (VOC2012)

Source: The Cityscapes Dataset: Semantic Understanding of Urban Street Scenes

Jha et al (2020) Kvasir-seg: A segmented polyp dataset

RankDice: Experiments

  • Three segmentation benchmarks: VOC, CityScapes, Kvasir
    • Standard benchmarks, NOT cherry-picks
  • Three commonly used DL models: DeepLab-V3+, PSPNet, FCN

RankDice: Experiments

  • Three segmentation benchmarks: VOC, CityScapes, Kvasir
    • Standard benchmarks, NOT cherry-picks
  • Three commonly used DL models: DeepLab-V3+, PSPNet, FCN

RankDice: Experiments

  • Three segmentation benchmarks: VOC, CityScapes, Kvasir
    • Standard benchmarks, NOT cherry-picks
  • Three commonly used DL models: DeepLab-V3+, PSPNet, FCN

DeepLab: Chen et al (2018) Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

RankDice: Experiments

  • Three segmentation benchmarks: VOC, CityScapes, Kvasir
    • Standard benchmarks, NOT cherry-picks
  • Three commonly used DL models: DeepLab-V3+, PSPNet, FCN

DeepLab: Chen et al (2018) Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

PSPNet: Zhao et al (2017) Pyramid Scene Parsing Network

RankDice: Experiments

  • Three segmentation benchmarks: VOC, CityScapes, Kvasir
    • Standard benchmarks, NOT cherry-picks
  • Three commonly used DL models: DeepLab-V3+, PSPNet, FCN

DeepLab: Chen et al (2018) Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

PSPNet: Zhao et al (2017) Pyramid Scene Parsing Network

FCN: Long, et al. (2015) Fully convolutional networks for semantic segmentation

RankDice: Experiments

  • Three segmentation benchmarks: VOC, CityScapes, Kvasir
    • Standard benchmarks, NOT cherry-picks
  • Three commonly used DL models: DeepLab-V3+, PSPNet, FCN
  • The proposed framework  VS.  the existing frameworks
    • Based on the same trained neural networks
    • No implementation tricks
    • Open-Source python module and codes
    • All trained neural networks available for free download

RankDice: Experiments

  • Three segmentation benchmarks: VOC, CityScapes, Kvasir
    • Standard benchmarks, NOT cherry-picks
  • Three commonly used DL models: DeepLab-V3+, PSPNet, FCN
  • The proposed framework  VS.  the existing framework
    • Based on the same trained neural network
    • No implementation tricks
    • Open-Source code
    • All trained neural networks available for free download

RankDice: Experiments

RankDice: Experiments

RankDice: Experiments

RankDice: Experiments

RankDice: Experiments

RankDice: Experiments

RankDice: Experiments

The optimal threshold is NOT 0.5, and it is adaptive over different images/inputs

RankDice: Experiments

The optimal threshold is NOT fixed, and it is adaptive over different images/inputs

mRankDice

Long, et al. (2015) Fully convolutional networks for semantic segmentation

mRankDice

  1. Probabilistic model: multiclass or multilabel
  2. Decision rule: overlapping / non-overlapping

mRankDice

mRankDice

mRankDice

mRankDice

mRankDice

More Results...

  • mRankDice: extension and challenge
  • RankIoU
  • Simulation
  • Probability calibration
  • ....

Contribution

  • To our best knowledge, the proposed ranking-based segmentation framework RankDice, is the first consistent segmentation framework with respect to the Dice metric.

  • Three numerical algorithms with GPU parallel execution are developed to implement the proposed framework in large-scale and high-dimensional segmentation.

  • We establish a theoretical foundation of segmentation with respect to the Dice metric, such as the Bayes rule, Dice-calibration, and a convergence rate of the excess risk for the proposed RankDice framework, and indicate inconsistent results for the existing methods.

  • Our experiments suggest that the improvement of RankDice over the existing frameworks is significant.

Thank you!

rankseg

By statmlben

rankseg

  • 400