Outline
Convolution Network Save Parameters
Dense Network
Conv Network
Weights : N x N
Operations : N x N
Weights : 3
Operations : N x 3
Complexity of Convolution Network
Feature Map
MxNxC
A Filter
KxKxC (+1:bias)
Filter 1
A New Channel
(M/s)x(N/s)x1
shift s pixel
Complexity of Convolution Network
Feature Map
MxNxC
A Filter
KxKxC (+1:bias)
Filter 1
Filter 2, 3, ..., D-1
Filter D
New Feature Map
(M/s)x(N/s)xD
Parameters:
A Filter KxKxC
A Conv Layer KxKxCxD
Complexity :
A new pixel KxKxC
A new channel KxKxCx(M/s)x(N/s)
A new feature map KxKxCx(M/s)x(N/s)xD
shift s pixel
Using cheaper mehtods to get information.
Saving parameter numbers or computation cost.
https://www.facebook.com/Lowcostcosplay/
Generate feature maps with cheaper method.
(NxN) : Original
(NxM+MxN = 2xNxM) : Bottleneck
2N/M : Ratio (Less than 1 if M < 0.5N)
(Input Channels, Output Channles, Kerenl Size)
(C, N, (K x K))
Parameters:
K x K x C x N : Traditional Convolution
K x K x (C/M) x (N/M) x M : M-Groups Group Convolution (=KxKxCxN/M)
1/M : Ratio
https://www.cnblogs.com/shine-lee/p/10243114.html
Issue
Different group will not commucate
https://medium.com/@zurister/depth-wise-convolution-and-depth-wise-separable-convolution-37346565d4ec
(Input Channels, Output Channles, Kerenl Size):
(C, C, (K x K))
Parameters:
Traditional Convolution
K x K x C x N
============================
Depthwise Seperable Convolution
(C-Groups Group Convolution)
K x K x (C/C) x (C/C) x C
1 x 1 Convolution
1 x 1 x C x N
============================
Ratio (D.S.C. cost ratio+ 1x1 C. cost ratio)
1/N + 1/(K x K)
Use dilate method to skip some nodes. If we do this, we can get greater respective fields compare to traditinal convolution.
And each nodes' contribute times are equal.
Oord, Aaron van den, et al. "Wavenet: A generative model for raw audio." arXiv preprint arXiv:1609.03499 (2016).
Convolution
Dilated
Convolution
Transfer the existed feature map/filters to get new feature map/filters.
Apply rotate, flip on filters to get more feature map.
No Filp
Filp
90°
180°
270°
8 times more feature maps without adding parameters
0°
Use a very cheap transform to generate a new feature map base on old features.
Paramters to get a new feature map
Traditional Conv : K x K x C+1
Depth Sep Conv : K x K +1
Ghost Module : 1+1
S_1x1
E_1x1
E_3x3
E_1x1
Concat
Fire Module
Take an example
input channels : 128
output channels : 128
S_1x1 : 16 ((1x1x128+1)x16)
E_1x1 : 64 ((1x1x16+1)x64)
E_3x3 : 64 ((3x3x16+1)x64)
12432 (Block's params)
Traditional Conv
147584 (((3x3x128+1)x128))
Ratio : 0.084
Take an example
input channels : 128
output channels : 128
3x3 DWConv : (3x3+1)x128
1x1 Conv : (1x1x128+1)x128
Block's Params
1280+16512=17792
Traditional Conv
147584 (((3x3x128+1)x128))
Ratio : 0.12
Issue(GConv)
Different group will not commucate
ShuffleNet
Shuffle feature maps to make a feature map can influence other groups' feature map in the future.
Take an example
input channels : 240
output channels : 240
Bottleneck channels : 240/4=80
Groups : 2
1x1 GConv : (240+1)x80/2
3x3 DWConv : (3x3+1)x80
1x1 GConv : (80+1)x240/2
Block's Params
9640+800+9720=20160
Traditional Conv
518640 ((9x240+1)x240)
Ratio : 0.04
Different from use K x K kernel to increse respective fields, this work shift feature maps to different direction to make differnt location's features can communicate to each other.
# see code (squeezenet, mobilenet, shufflenet, ghostnet)
git clone https://gitlab.aiacademy.tw/yidar/CompactModelDesign.git
Survey Paper
Cheng, Yu, et al. "A survey of model compression and acceleration for deep neural networks." arXiv preprint arXiv:1710.09282 (2017).