Fusion Module $F_γ$

Dualtask

Loss

Atrous Spatial

Pyramid Pooling

Semantic

CE Loss

$r$

$s$

$f=p(y|s, r)=F_γ(s,r)∈R^{C×H×W}$

$f$

$L^{θ φ,γ}=λ_1L^{θ,φ}_{BCE}(s,\hat{s})+λ_2L^{θ φ,γ}_{CE}(\hat{y}, f)$ , where $λ_1=10, λ_2=1$

Let $ζ∈R^{H×W}$ be a potential that represents whether a particular pixel belongs to a semantic boundary in the input image $I$ and

$ζ=\frac{1}{\sqrt{2}}||\nabla(G ∗ \underset{k}{\mathrm{argmax}} p(y^k|r,s))||$ , where $G$ denotes a Gaussian filter. Also $\hat{ζ}$ is a GT binary mask.

$L^{θ φ,γ}_{reg→} = λ_3\sum_{p+}|ζ(p+) − \hat{ζ}(p+)|$

$L^{θ φ,γ}_{reg←} = λ_4\sum_{k,p} 1_{k_p} [\hat{y}^k_p logp(y^k_p|r, s)]$

$L^{θ φ,γ}=L^{θ φ,γ}_{reg→}+L^{θ φ,γ}_{reg←}$

Where $p+$ contains the set of all non-zero pixel coordinates and $1_s=\{1: s > thrs\}$ , where $thrs$ is a cofidence treshold $=0.8$ and $λ_3=1, λ_4=1$

Gated Shape CNN s for Semantic Segmentation arXiv 12 Jul 2019 Project Page

What's the idea?

Architecture

Regular Stream $R_θ(I)$

Shape Stream $S_φ$

Fusion Module $F_γ$

Atrous (Dilated) Conv and

Atrous Spatial Pyramid Pooling

Results

Benchmark

Questions?