$$ \text{Acc}( \delta) = \mathbb{E}( \mathbf{1}( Y = \delta(\mathbf{X}) )) $$
What is the "best" decision function? Bayes Rule!
$$ \delta^* = \argmax_{\delta} \ \text{Acc}(\delta) \ \to \ \delta^*(\mathbf{x}) = \mathbf{1}( p(\mathbf{x}) \geq 0.5 ) $$
Plug-in rule:
$$\widehat{\delta}(\mathbf{x}) = \mathbf{1}( q(\mathbf{x}) \geq 0.5 ), \quad q(\mathbf{x}) \text{ is an estimator of } p(\mathbf{x}) $$
$$ p(\mathbf{x}) = \mathbb{P}(Y=1|\mathbf{X}=\mathbf{x})$$
Long, et al. (2015) Fully convolutional networks for semantic segmentation
Long, et al. (2015) Fully convolutional networks for semantic segmentation
Long, et al. (2015) Fully convolutional networks for semantic segmentation
Goal: learn segmentation decision function \( \pmb{\delta} \)
The Dice and IoU metrics are introduced and widely used in the literature:
The Dice and IoU metrics are introduced and widely used in the literature:
The Dice and IoU metrics are introduced and widely used in the literature:
Classification-based loss
Dice-approximating loss
We discuss Dice-segmentation at the population level, and present its Bayes segmentation rule akin to the Bayes classifier.
$$ p_j(\mathbf{x}) := \mathbb{P}(Y_j = 1 | \mathbf{X} = \mathbf{x})$$
To begin with, we introduce some notations:
$$ \pmb{\delta}^* = \text{argmax}_{\pmb{\delta}} \ \text{Dice}_\gamma ( \pmb{\delta})$$
We discuss Dice-segmentation at the population level, and present its Bayes segmentation rule akin to the Bayes classifier.
$$ p_j(\mathbf{x}) := \mathbb{P}(Y_j = 1 | \mathbf{X} = \mathbf{x})$$
To begin with, we introduce some notations:
$$ \pmb{\delta}^* = \text{argmax}_{\pmb{\delta}} \ \text{Dice}_\gamma ( \pmb{\delta})$$
B |
Theorem 1 (Dai and Li, 2023). A segmentation rule \(\pmb{\delta}^*\) is a global maximizer of \(\text{Dice}_\gamma(\pmb{\delta})\) if and only if it satisfies that
\( \tau^*(\mathbf{x}) \) is called optimal segmentation volume, defined as
where \(J_\tau(\mathbf{x})\) is the index set of the \(\tau\)-largest probabilities, \(\Gamma(\mathbf{x}) = \sum_{j=1}^d {B}_{j}(\mathbf{x})\), and \( {\Gamma}_{- j}(\mathbf{x}) = \sum_{j' \neq j} {B}_{j'}(\mathbf{x})\) are Poisson-binomial random variables.
Theorem 1 (Dai and Li, 2023). A segmentation rule \(\pmb{\delta}^*\) is a global maximizer of \(\text{Dice}_\gamma(\pmb{\delta})\) if and only if it satisfies that
\( \tau^*(\mathbf{x}) \) is called optimal segmentation volume, defined as
where \(J_\tau(\mathbf{x})\) is the index set of the \(\tau\)-largest probabilities, \(\Gamma(\mathbf{x}) = \sum_{j=1}^d {B}_{j}(\mathbf{x})\), and \( {\Gamma}_{- j}(\mathbf{x}) = \sum_{j' \neq j} {B}_{j'}(\mathbf{x})\) are Poisson-binomial random variables.
The Dice measure is separable w.r.t. \(j\)
Theorem 1 (Dai and Li, 2023). A segmentation rule \(\pmb{\delta}^*\) is a global maximizer of \(\text{Dice}_\gamma(\pmb{\delta})\) if and only if it satisfies that
\( \tau^*(\mathbf{x}) \) is called optimal segmentation volume, defined as
Obs: both the Bayes segmentation rule \(\pmb{\delta}^*(\mathbf{x})\) and the optimal volume function \(\tau^*(\mathbf{x})\) are achievable when the conditional probability \(\mathbf{p}(\mathbf{x}) = ( p_1(\mathbf{x}), \cdots, p_d(\mathbf{x}) )^\intercal\) is well-estimated
Theorem 1 (Dai and Li, 2023). A segmentation rule \(\pmb{\delta}^*\) is a global maximizer of \(\text{Dice}_\gamma(\pmb{\delta})\) if and only if it satisfies that
\( \tau^*(\mathbf{x}) \) is called optimal segmentation volume, defined as
where \(J_\tau(\mathbf{x})\) is the index set of the \(\tau\)-largest probabilities, \(\Gamma(\mathbf{x}) = \sum_{j=1}^d {B}_{j}(\mathbf{x})\), and \( {\Gamma}_{- j}(\mathbf{x}) = \sum_{j' \neq j} {B}_{j'}(\mathbf{x})\) are Poisson-binomial random variables.
RankDice inspired by Thm 1 (plug-in rule)
Ranking the conditional probability \(p_j(\mathbf{x})\)
Theorem 1 (Dai and Li, 2023+). A segmentation rule \(\pmb{\delta}^*\) is a global maximizer of \(\text{Dice}_\gamma(\pmb{\delta})\) if and only if it satisfies that
\( \tau^*(\mathbf{x}) \) is called optimal segmentation volume, defined as
RankDice inspired by Thm 1
Ranking the conditional probability \(p_j(\mathbf{x})\)
searching for the optimal volume of the segmented features \(\tau(\mathbf{x})\)
Note that (6) can be rewritten as:
In practice, the DFT–CF method is generally recommended for computing. The RF1 method can also been used when n < 1000, because there is not much difference in computing time from the DFT–CF method. The RNA method is recommended when n > 2000 and the cdf needs to be evaluated many times. As shown in the numerical study, the RNA method can approximate the cdf well, when n is large, and is more computationally efficient.
Hong. (2013) On computing the distribution function for the Poisson binomial distribution
In practice, the DFT–CF method is generally recommended for computing. The RF1 method can also been used when n < 1000, because there is not much difference in computing time from the DFT–CF method. The RNA method is recommended when n > 2000 and the cdf needs to be evaluated many times. As shown in the numerical study, the RNA method can approximate the cdf well, when n is large, and is more computationally efficient.
Hong. (2013) On computing the distribution function for the Poisson binomial distribution
Lemma 3 (Dai and Li, 2023). If \(\sum_{s=1}^{\tau} \widehat{q}_{j_s}(\mathbf{x}) \geq (\tau + \gamma + d) \widehat{q}_{j_{\tau+1}}(\mathbf{x})\), then \(\bar{\pi}_\tau(\mathbf{x}) \geq \bar{\pi}_{\tau'}(\mathbf{x})\) for all \(\tau' >\tau\)
Early stop!
It is unnecessary to compute all \(\mathbb{P}(\widehat{\Gamma}_{-j}(\mathbf{x}) = l)\) and \(\mathbb{P}(\widehat{\Gamma}(\mathbf{x}) = l)\) for \(l=1, \cdots, d\), since they are negligibly close to zero when \(l\) is too small or too large.
Truncation!
\(\widehat{\sigma}^2(\mathbf{x}) = \sum_{j=1}^d \widehat{q}_j(\mathbf{x}) (1 - \widehat{q}_j(\mathbf{x})) \to \infty \quad \text{as } d \to \infty \)
GPU via CUDA
Fisher consistency or Classification-Calibration
(Lin, 2004, Zhang, 2004, Bartlett et al 2006)
Classification
Segmentation
Source: Visual Object Classes Challenge 2012 (VOC2012)
Source: Visual Object Classes Challenge 2012 (VOC2012)
Source: The Cityscapes Dataset: Semantic Understanding of Urban Street Scenes
Source: Visual Object Classes Challenge 2012 (VOC2012)
Source: The Cityscapes Dataset: Semantic Understanding of Urban Street Scenes
Jha et al (2020) Kvasir-seg: A segmented polyp dataset
DeepLab: Chen et al (2018) Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
DeepLab: Chen et al (2018) Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
PSPNet: Zhao et al (2017) Pyramid Scene Parsing Network
DeepLab: Chen et al (2018) Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
PSPNet: Zhao et al (2017) Pyramid Scene Parsing Network
FCN: Long, et al. (2015) Fully convolutional networks for semantic segmentation
The optimal threshold is NOT 0.5, and it is adaptive over different images/inputs
The optimal threshold is NOT fixed, and it is adaptive over different images/inputs
Long, et al. (2015) Fully convolutional networks for semantic segmentation
To our best knowledge, the proposed ranking-based segmentation framework RankDice, is the first consistent segmentation framework with respect to the Dice metric.
Three numerical algorithms with GPU parallel execution are developed to implement the proposed framework in large-scale and high-dimensional segmentation.
We establish a theoretical foundation of segmentation with respect to the Dice metric, such as the Bayes rule, Dice-calibration, and a convergence rate of the excess risk for the proposed RankDice framework, and indicate inconsistent results for the existing methods.
Our experiments suggest that the improvement of RankDice over the existing frameworks is significant.
To our best knowledge, the proposed ranking-based segmentation framework RankDice, is the first consistent segmentation framework with respect to the Dice metric.
Three numerical algorithms with GPU parallel execution are developed to implement the proposed framework in large-scale and high-dimensional segmentation.
We establish a theoretical foundation of segmentation with respect to the Dice metric, such as the Bayes rule, Dice-calibration, and a convergence rate of the excess risk for the proposed RankDice framework, and indicate inconsistent results for the existing methods.
Our experiments suggest that the improvement of RankDice over the existing frameworks is significant.
“There is Nothing More Practical Than A Good Theory.”
— Kurt Lewin