## From Classification to Segmentation

• Recall Classification problem
• Data: $$\mathbf{X} \in \mathbb{R}^d \to Y \in \{0,1\}$$
• Decision function: $$\delta(\mathbf{X}): \mathbb{R}^d \to \{0,1\}$$
• Evaluation:

$$\text{Acc}( \delta) = \mathbb{E}( \mathbf{1}( Y = \delta(\mathbf{X}) ))$$

What is the "best" decision function? Bayes Rule!

$$\delta^* = \argmax_{\delta} \ \text{Acc}(\delta) \ \to \ \delta^*(\mathbf{x}) = \mathbf{1}( p(\mathbf{x}) \geq 0.5 )$$

Plug-in rule:

$$\widehat{\delta}(\mathbf{x}) = \mathbf{1}( q(\mathbf{x}) \geq 0.5 ), \quad q(\mathbf{x}) \text{ is an estimator of } p(\mathbf{x})$$

$$p(\mathbf{x}) = \mathbb{P}(Y=1|\mathbf{X}=\mathbf{x})$$

## Segmentation

Long, et al. (2015) Fully convolutional networks for semantic segmentation

• Input: $$\mathbf{X} \in \mathbb{R}^d$$
• Outcome: $$\mathbf{Y} \in \{0,1\}^d$$
• Segmentation function:
• $$\pmb{\delta}: \mathbb{R}^d \to \{0,1\}^d$$
• $$\pmb{\delta}(\mathbf{X}) = ( \delta_1(\mathbf{X}), \cdots, \delta_d(\mathbf{X}) )^\intercal$$
• Predicted segmentation:
• $$I(\pmb{\delta}(\mathbf{X})) = \{j: \delta_j(\mathbf{X}) = 1 \}$$

## Segmentation

Long, et al. (2015) Fully convolutional networks for semantic segmentation

• Input: $$\mathbf{X} \in \mathbb{R}^d$$
• Outcome: $$\mathbf{Y} \in \{0,1\}^d$$
• Segmentation function:
• $$\pmb{\delta}: \mathbb{R}^d \to \{0,1\}^d$$
• $$\pmb{\delta}(\mathbf{X}) = ( \delta_1(\mathbf{X}), \cdots, \delta_d(\mathbf{X}) )^\intercal$$
• Predicted segmentation:
• $$I(\pmb{\delta}(\mathbf{X})) = \{j: \delta_j(\mathbf{X}) = 1 \}$$

## Segmentation

Long, et al. (2015) Fully convolutional networks for semantic segmentation

• Input: $$\mathbf{X} \in \mathbb{R}^d$$
• Outcome: $$\mathbf{Y} \in \{0,1\}^d$$
• Segmentation function:
• $$\pmb{\delta}: \mathbb{R}^d \to \{0,1\}^d$$
• $$\pmb{\delta}(\mathbf{X}) = ( \delta_1(\mathbf{X}), \cdots, \delta_d(\mathbf{X}) )^\intercal$$
• Predicted segmentation:
• $$I(\pmb{\delta}(\mathbf{X})) = \{j: \delta_j(\mathbf{X}) = 1 \}$$

Goal: learn segmentation decision function $$\pmb{\delta}$$

## Evaluation

The Dice and IoU metrics are introduced and widely used in the literature:

## Evaluation

The Dice and IoU metrics are introduced and widely used in the literature:

## Evaluation

The Dice and IoU metrics are introduced and widely used in the literature:

## Existing Framework

• Given training data $$\{\mathbf{x}_i, \mathbf{y}_i\} _{i=1, \cdots, n}$$, most existing methods characterize segmentation as a classification problem:

## Existing Framework

• Given training data $$\{\mathbf{x}_i, \mathbf{y}_i\} _{i=1, \cdots, n}$$, most existing methods characterize segmentation as a classification problem:

Classification-based loss

## Existing Framework

• Given training data $$\{\mathbf{x}_i, \mathbf{y}_i\} _{i=1, \cdots, n}$$, most existing methods characterize segmentation as a classification problem:

Dice-approximating loss

## Bayes Segmentation Rule

We discuss Dice-segmentation at the population level, and present its Bayes segmentation rule akin to the Bayes classifier.

$$p_j(\mathbf{x}) := \mathbb{P}(Y_j = 1 | \mathbf{X} = \mathbf{x})$$

To begin with, we introduce some notations:

• Segmentation probability for the $$j$$-th pixel
• $${B}_j(\mathbf{x})$$ is a Bernoulli random variable with the success probability $$p_{j}(\mathbf{x})$$

$$\pmb{\delta}^* = \text{argmax}_{\pmb{\delta}} \ \text{Dice}_\gamma ( \pmb{\delta})$$

## Bayes Segmentation Rule

We discuss Dice-segmentation at the population level, and present its Bayes segmentation rule akin to the Bayes classifier.

$$p_j(\mathbf{x}) := \mathbb{P}(Y_j = 1 | \mathbf{X} = \mathbf{x})$$

To begin with, we introduce some notations:

• Segmentation probability for the $$j$$-th pixel
• $${B}_j(\mathbf{x})$$ is a Bernoulli random variable with the success probability $$p_{j}(\mathbf{x})$$

$$\pmb{\delta}^* = \text{argmax}_{\pmb{\delta}} \ \text{Dice}_\gamma ( \pmb{\delta})$$

B

## Bayes Segmentation Rule

Theorem 1 (Dai and Li, 2023). A segmentation rule $$\pmb{\delta}^*$$ is a global maximizer of $$\text{Dice}_\gamma(\pmb{\delta})$$ if and only if it satisfies that

$$\tau^*(\mathbf{x})$$ is called optimal segmentation volume, defined as

where $$J_\tau(\mathbf{x})$$ is the index set of the $$\tau$$-largest probabilities, $$\Gamma(\mathbf{x}) = \sum_{j=1}^d {B}_{j}(\mathbf{x})$$, and $${\Gamma}_{- j}(\mathbf{x}) = \sum_{j' \neq j} {B}_{j'}(\mathbf{x})$$ are Poisson-binomial random variables.

## Bayes Segmentation Rule

Theorem 1 (Dai and Li, 2023). A segmentation rule $$\pmb{\delta}^*$$ is a global maximizer of $$\text{Dice}_\gamma(\pmb{\delta})$$ if and only if it satisfies that

$$\tau^*(\mathbf{x})$$ is called optimal segmentation volume, defined as

where $$J_\tau(\mathbf{x})$$ is the index set of the $$\tau$$-largest probabilities, $$\Gamma(\mathbf{x}) = \sum_{j=1}^d {B}_{j}(\mathbf{x})$$, and $${\Gamma}_{- j}(\mathbf{x}) = \sum_{j' \neq j} {B}_{j'}(\mathbf{x})$$ are Poisson-binomial random variables.

## Bayes Segmentation Rule

The Dice measure is separable w.r.t. $$j$$

## Bayes Segmentation Rule

Theorem 1 (Dai and Li, 2023). A segmentation rule $$\pmb{\delta}^*$$ is a global maximizer of $$\text{Dice}_\gamma(\pmb{\delta})$$ if and only if it satisfies that

$$\tau^*(\mathbf{x})$$ is called optimal segmentation volume, defined as

Obs: both the Bayes segmentation rule $$\pmb{\delta}^*(\mathbf{x})$$ and the optimal volume function $$\tau^*(\mathbf{x})$$ are achievable when the conditional probability $$\mathbf{p}(\mathbf{x}) = ( p_1(\mathbf{x}), \cdots, p_d(\mathbf{x}) )^\intercal$$ is well-estimated

## Bayes Segmentation Rule

Theorem 1 (Dai and Li, 2023). A segmentation rule $$\pmb{\delta}^*$$ is a global maximizer of $$\text{Dice}_\gamma(\pmb{\delta})$$ if and only if it satisfies that

$$\tau^*(\mathbf{x})$$ is called optimal segmentation volume, defined as

where $$J_\tau(\mathbf{x})$$ is the index set of the $$\tau$$-largest probabilities, $$\Gamma(\mathbf{x}) = \sum_{j=1}^d {B}_{j}(\mathbf{x})$$, and $${\Gamma}_{- j}(\mathbf{x}) = \sum_{j' \neq j} {B}_{j'}(\mathbf{x})$$ are Poisson-binomial random variables.

RankDice inspired by Thm 1 (plug-in rule)

1. Ranking the conditional probability $$p_j(\mathbf{x})$$

## Bayes Segmentation Rule

Theorem 1 (Dai and Li, 2023+). A segmentation rule $$\pmb{\delta}^*$$ is a global maximizer of $$\text{Dice}_\gamma(\pmb{\delta})$$ if and only if it satisfies that

$$\tau^*(\mathbf{x})$$ is called optimal segmentation volume, defined as

RankDice inspired by Thm 1

1. Ranking the conditional probability $$p_j(\mathbf{x})$$

2. searching for the optimal volume of the segmented features $$\tau(\mathbf{x})$$

## RankDice: Algo

1. Fast evaluation of Poisson-binomial r.v.
2. Quick search for $$\tau \in \{0,1,\cdots, d\}$$

Note that (6) can be rewritten as:

## RankDice: Algo

In practice, the DFT–CF method is generally recommended for computing. The RF1 method can also been used when n < 1000, because there is not much difference in computing time from the DFT–CF method. The RNA method is recommended when n > 2000 and the cdf needs to be evaluated many times. As shown in the numerical study, the RNA method can approximate the cdf well, when n is large, and is more computationally efficient.

Hong. (2013) On computing the distribution function for the Poisson binomial distribution

## RankDice: Algo

In practice, the DFT–CF method is generally recommended for computing. The RF1 method can also been used when n < 1000, because there is not much difference in computing time from the DFT–CF method. The RNA method is recommended when n > 2000 and the cdf needs to be evaluated many times. As shown in the numerical study, the RNA method can approximate the cdf well, when n is large, and is more computationally efficient.

Hong. (2013) On computing the distribution function for the Poisson binomial distribution

1. Fast evaluation of Poisson-binomial r.v.
2. Quick search for $$\tau \in \{0,1,\cdots, d\}$$

## RankDice: Algo-Early Stop

Lemma 3 (Dai and Li, 2023). If $$\sum_{s=1}^{\tau} \widehat{q}_{j_s}(\mathbf{x}) \geq (\tau + \gamma + d) \widehat{q}_{j_{\tau+1}}(\mathbf{x})$$, then $$\bar{\pi}_\tau(\mathbf{x}) \geq \bar{\pi}_{\tau'}(\mathbf{x})$$ for all $$\tau' >\tau$$

Early stop!

## RankDice: Algo-TRNA

It is unnecessary to compute all $$\mathbb{P}(\widehat{\Gamma}_{-j}(\mathbf{x}) = l)$$ and $$\mathbb{P}(\widehat{\Gamma}(\mathbf{x}) = l)$$ for $$l=1, \cdots, d$$, since they are negligibly close to zero when $$l$$ is too small or too large.

Truncation!

## RankDice: Algo-TRNA

$$\widehat{\sigma}^2(\mathbf{x}) = \sum_{j=1}^d \widehat{q}_j(\mathbf{x}) (1 - \widehat{q}_j(\mathbf{x})) \to \infty \quad \text{as } d \to \infty$$

GPU via CUDA

## RankDice: Theory

Fisher consistency or Classification-Calibration

(Lin, 2004, Zhang, 2004, Bartlett et al 2006)

Classification

Segmentation

## RankDice: Experiments

• Three segmentation benchmark: VOC, CityScapes, Kvasir

## RankDice: Experiments

• Three segmentation benchmark: VOC, CityScapes, Kvasir

Source: Visual Object Classes Challenge 2012 (VOC2012)

## RankDice: Experiments

• Three segmentation benchmark: VOC, CityScapes, Kvasir

Source: Visual Object Classes Challenge 2012 (VOC2012)

Source: The Cityscapes Dataset: Semantic Understanding of Urban Street Scenes

## RankDice: Experiments

• Three segmentation benchmarks: VOC, CityScapes, Kvasir

Source: Visual Object Classes Challenge 2012 (VOC2012)

Source: The Cityscapes Dataset: Semantic Understanding of Urban Street Scenes

Jha et al (2020) Kvasir-seg: A segmented polyp dataset

## RankDice: Experiments

• Three segmentation benchmarks: VOC, CityScapes, Kvasir
• Standard benchmarks, NOT cherry-picks
• Three commonly used DL models: DeepLab-V3+, PSPNet, FCN

## RankDice: Experiments

• Three segmentation benchmarks: VOC, CityScapes, Kvasir
• Standard benchmarks, NOT cherry-picks
• Three commonly used DL models: DeepLab-V3+, PSPNet, FCN

## RankDice: Experiments

• Three segmentation benchmarks: VOC, CityScapes, Kvasir
• Standard benchmarks, NOT cherry-picks
• Three commonly used DL models: DeepLab-V3+, PSPNet, FCN

DeepLab: Chen et al (2018) Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

## RankDice: Experiments

• Three segmentation benchmarks: VOC, CityScapes, Kvasir
• Standard benchmarks, NOT cherry-picks
• Three commonly used DL models: DeepLab-V3+, PSPNet, FCN

DeepLab: Chen et al (2018) Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

PSPNet: Zhao et al (2017) Pyramid Scene Parsing Network

## RankDice: Experiments

• Three segmentation benchmarks: VOC, CityScapes, Kvasir
• Standard benchmarks, NOT cherry-picks
• Three commonly used DL models: DeepLab-V3+, PSPNet, FCN

DeepLab: Chen et al (2018) Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

PSPNet: Zhao et al (2017) Pyramid Scene Parsing Network

FCN: Long, et al. (2015) Fully convolutional networks for semantic segmentation

## RankDice: Experiments

• Three segmentation benchmarks: VOC, CityScapes, Kvasir
• Standard benchmarks, NOT cherry-picks
• Three commonly used DL models: DeepLab-V3+, PSPNet, FCN
• The proposed framework  VS.  the existing frameworks
• Based on the same trained neural networks
• No implementation tricks
• Open-Source python module and codes

## RankDice: Experiments

• Three segmentation benchmarks: VOC, CityScapes, Kvasir
• Standard benchmarks, NOT cherry-picks
• Three commonly used DL models: DeepLab-V3+, PSPNet, FCN
• The proposed framework  VS.  the existing framework
• Based on the same trained neural network
• No implementation tricks
• Open-Source code

## RankDice: Experiments

The optimal threshold is NOT 0.5, and it is adaptive over different images/inputs

## RankDice: Experiments

The optimal threshold is NOT fixed, and it is adaptive over different images/inputs

## mRankDice

Long, et al. (2015) Fully convolutional networks for semantic segmentation

## mRankDice

1. Probabilistic model: multiclass or multilabel
2. Decision rule: overlapping / non-overlapping

## More Results...

• mRankDice: extension and challenge
• RankIoU
• Simulation
• Probability calibration
• ....

## Contribution

• To our best knowledge, the proposed ranking-based segmentation framework RankDice, is the first consistent segmentation framework with respect to the Dice metric.

• Three numerical algorithms with GPU parallel execution are developed to implement the proposed framework in large-scale and high-dimensional segmentation.

• We establish a theoretical foundation of segmentation with respect to the Dice metric, such as the Bayes rule, Dice-calibration, and a convergence rate of the excess risk for the proposed RankDice framework, and indicate inconsistent results for the existing methods.

• Our experiments suggest that the improvement of RankDice over the existing frameworks is significant.

## Contribution

• To our best knowledge, the proposed ranking-based segmentation framework RankDice, is the first consistent segmentation framework with respect to the Dice metric.

• Three numerical algorithms with GPU parallel execution are developed to implement the proposed framework in large-scale and high-dimensional segmentation.

• We establish a theoretical foundation of segmentation with respect to the Dice metric, such as the Bayes rule, Dice-calibration, and a convergence rate of the excess risk for the proposed RankDice framework, and indicate inconsistent results for the existing methods.

• Our experiments suggest that the improvement of RankDice over the existing frameworks is significant.

“There is Nothing More Practical Than A Good Theory.”

— Kurt Lewin

By statmlben

• 175