How do we reduce the computational complexity of CNN while without loss in accuracy while running on Mobile and Edge Devices?
Introduces two new operations:
https://tinyurl.com/shufflenet
- ResNeXt only 3 × 3 layers are equipped with group convolutions. As a result, for each residual unit in ResNeXt the pointwise convolutions occupy 93.4% multiplication-adds
- if multiple group convolutions stack together, there is one side effect: outputs from a certain channel are only derived from a small fraction of input channels
- This property blocks information flow between channel groups and weakens representation
Channel Shuffle for Group Convolutions
If we allow group convolution to obtain input data from different groups, the input and output channels will be fully related
Scaling Factor means scaling the number of filters in ShuffleNet 1× by s times thus overall complexity will be roughly s^2 times of ShuffleNet 1×