Gated Shape CNNs
for Semantic Segmentation
12 Jul 2019
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1111078/images/6436618/intro.jpg)
What's the idea?
Problem:
In current methods color, shape and texture information are all processed together i.e. we are loosing information
Solution:
Let's simply* process shape in a parallel stream
* with novel Gated Conv Layer, Atrous Spatial Pyramid Pooling, Dual Task Regularizer and other cool stuff
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1111078/images/6436815/architecture.jpg)
Architecture
Regular Stream \(R_θ(I)\)
Input Image $$I∈R^{3×H×W}$$
Output feature representation
$$r∈R^{C×\frac{H}{m}×\frac{W}{m}}$$, where \(m\) is stride
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1111078/images/6438198/fcn_layers.png)
Shape Stream \(S_φ\)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1111078/images/6437081/pasted-from-clipboard.png)
Output boundary map
$$s∈R^{H×W}$$
concat
concat
concat
\(r_0\)
\(r_{m-2}\)
\(r_{m-1}\)
\(r_m\)
conv
1x1
\(σ\)
$$\circ$$
\(\circ\) is an element-wise product
\(\bigoplus\)
conv
conv
1x1
\(σ\)
$$\circ$$
\(\bigoplus\)
conv
$$\hat{s_1}$$
...
$$\hat{s_{m-2}}$$
concat
conv
1x1
\(σ\)
$$\circ$$
\(\bigoplus\)
conv
$$\hat{s_{m-1}}$$
conv
1x1
\(σ\)
$$\circ$$
\(\bigoplus\)
conv
$$\hat{s_m}$$
conv
1x1
Edge
BCE Loss
concat
conv
1x1
\(\nabla I\)
\(\nabla I\)
Fusion Module \(F_γ\)
Dualtask
Loss
Atrous Spatial
Pyramid Pooling
Semantic
CE Loss
\(r\)
\(s\)
\(f=p(y|s, r)=F_γ(s,r)∈R^{C×H×W}\)
\(f\)
\(L^{θ φ,γ}=λ_1L^{θ,φ}_{BCE}(s,\hat{s})+λ_2L^{θ φ,γ}_{CE}(\hat{y}, f)\), where \(λ_1=10, λ_2=1\)
Let \(ζ∈R^{H×W}\) be a potential that represents whether a particular pixel belongs to a semantic boundary in the input image \(I\) and
\(ζ=\frac{1}{\sqrt{2}}||\nabla(G ∗ \underset{k}{\mathrm{argmax}} p(y^k|r,s))||\), where \(G\) denotes a Gaussian filter. Also \(\hat{ζ}\) is a GT binary mask.
\(L^{θ φ,γ}_{reg→} = λ_3\sum_{p+}|ζ(p+) − \hat{ζ}(p+)|\)
\(L^{θ φ,γ}_{reg←} = λ_4\sum_{k,p} 1_{k_p} [\hat{y}^k_p logp(y^k_p|r, s)]\)
\(L^{θ φ,γ}=L^{θ φ,γ}_{reg→}+L^{θ φ,γ}_{reg←}\)
Where \(p+\) contains the set of all non-zero pixel coordinates and \(1_s=\{1: s > thrs\}\), where \(thrs\) is a cofidence treshold \(=0.8\) and \(λ_3=1, λ_4=1\)
Atrous (Dilated) Conv and
Atrous Spatial Pyramid Pooling
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1111078/images/6438509/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1111078/images/6438521/pasted-from-clipboard.png)
Results
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1111078/images/6438522/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1111078/images/6438523/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1111078/images/6438525/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1111078/images/6438526/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1111078/images/6438529/pasted-from-clipboard.png)
Benchmark
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1111078/images/6439044/pasted-from-clipboard.png)
Questions?
https://t.me/NikitaDetkov
Gated Shape CNN
By Nikita
Gated Shape CNN
- 270