Gated Shape CNNs
for Semantic Segmentation
12 Jul 2019
What's the idea?
Problem:
In current methods color, shape and texture information are all processed together i.e. we are loosing information
Solution:
Let's simply* process shape in a parallel stream
* with novel Gated Conv Layer, Atrous Spatial Pyramid Pooling, Dual Task Regularizer and other cool stuff
Architecture
Regular Stream \(R_θ(I)\)
Input Image $$I∈R^{3×H×W}$$
Output feature representation
$$r∈R^{C×\frac{H}{m}×\frac{W}{m}}$$, where \(m\) is stride
Shape Stream \(S_φ\)
Output boundary map
$$s∈R^{H×W}$$
concat
concat
concat
\(r_0\)
\(r_{m-2}\)
\(r_{m-1}\)
\(r_m\)
conv
1x1
\(σ\)
$$\circ$$
\(\circ\) is an element-wise product
\(\bigoplus\)
conv
conv
1x1
\(σ\)
$$\circ$$
\(\bigoplus\)
conv
$$\hat{s_1}$$
...
$$\hat{s_{m-2}}$$
concat
conv
1x1
\(σ\)
$$\circ$$
\(\bigoplus\)
conv
$$\hat{s_{m-1}}$$
conv
1x1
\(σ\)
$$\circ$$
\(\bigoplus\)
conv
$$\hat{s_m}$$
conv
1x1
Edge
BCE Loss
concat
conv
1x1
\(\nabla I\)
\(\nabla I\)
Fusion Module \(F_γ\)
Dualtask
Loss
Atrous Spatial
Pyramid Pooling
Semantic
CE Loss
\(r\)
\(s\)
\(f=p(y|s, r)=F_γ(s,r)∈R^{C×H×W}\)
\(f\)
\(L^{θ φ,γ}=λ_1L^{θ,φ}_{BCE}(s,\hat{s})+λ_2L^{θ φ,γ}_{CE}(\hat{y}, f)\), where \(λ_1=10, λ_2=1\)
Let \(ζ∈R^{H×W}\) be a potential that represents whether a particular pixel belongs to a semantic boundary in the input image \(I\) and
\(ζ=\frac{1}{\sqrt{2}}||\nabla(G ∗ \underset{k}{\mathrm{argmax}} p(y^k|r,s))||\), where \(G\) denotes a Gaussian filter. Also \(\hat{ζ}\) is a GT binary mask.
\(L^{θ φ,γ}_{reg→} = λ_3\sum_{p+}|ζ(p+) − \hat{ζ}(p+)|\)
\(L^{θ φ,γ}_{reg←} = λ_4\sum_{k,p} 1_{k_p} [\hat{y}^k_p logp(y^k_p|r, s)]\)
\(L^{θ φ,γ}=L^{θ φ,γ}_{reg→}+L^{θ φ,γ}_{reg←}\)
Where \(p+\) contains the set of all non-zero pixel coordinates and \(1_s=\{1: s > thrs\}\), where \(thrs\) is a cofidence treshold \(=0.8\) and \(λ_3=1, λ_4=1\)
Atrous (Dilated) Conv and
Atrous Spatial Pyramid Pooling
Results
Benchmark
Questions?
https://t.me/NikitaDetkov
Gated Shape CNN
By Nikita
Gated Shape CNN
- 296