Gated Shape CNNs
for Semantic Segmentation
12 Jul 2019
Problem:
In current methods color, shape and texture information are all processed together i.e. we are loosing information
Solution:
Let's simply* process shape in a parallel stream
* with novel Gated Conv Layer, Atrous Spatial Pyramid Pooling, Dual Task Regularizer and other cool stuff
Input Image I∈R3×H×W
Output feature representation
r∈RC×mH×mW, where m is stride
Output boundary map
s∈RH×W
concat
concat
concat
r0
rm−2
rm−1
rm
conv
1x1
σ
∘
∘ is an element-wise product
⨁
conv
conv
1x1
σ
∘
⨁
conv
s1^
...
sm−2^
concat
conv
1x1
σ
∘
⨁
conv
sm−1^
conv
1x1
σ
∘
⨁
conv
sm^
conv
1x1
Edge
BCE Loss
concat
conv
1x1
∇I
∇I
Dualtask
Loss
Atrous Spatial
Pyramid Pooling
Semantic
CE Loss
r
s
f=p(y∣s,r)=Fγ(s,r)∈RC×H×W
f
Lθ φ,γ=λ1LBCEθ,φ(s,s^)+λ2LCEθφ,γ(y^,f), where λ1=10,λ2=1
Let ζ∈RH×W be a potential that represents whether a particular pixel belongs to a semantic boundary in the input image I and
ζ=21∣∣∇(G∗kargmaxp(yk∣r,s))∣∣, where G denotes a Gaussian filter. Also ζ^ is a GT binary mask.
Lreg→θ φ,γ=λ3∑p+∣ζ(p+)−ζ^(p+)∣
Lreg←θ φ,γ=λ4∑k,p1kp[y^pklogp(ypk∣r,s)]
Lθ φ,γ=Lreg→θ φ,γ+Lreg←θ φ,γ
Where p+ contains the set of all non-zero pixel coordinates and 1s={1:s>thrs}, where thrs is a cofidence treshold =0.8 and λ3=1,λ4=1
https://t.me/NikitaDetkov