ShuffleNet

How do we reduce the computational complexity of CNN while without loss in accuracy while running on Mobile and Edge Devices?

Introduces two new operations:

Pointwise Group Convolutions
Shuffle Operations

https://tinyurl.com/shufflenet

Current Solutions

Increase depth (GoogLeNet)
Residual Networks (ResNet)
DepthWise Separable Convolutions (Xception, MobileNet, ResNeXt)
Grouped Convolutions (AlexNet)
RL (NasNet)
Pruning, Quantisation, DFT/FFT

Problem With Group Convolutions

ResNeXt only 3 × 3 layers are equipped with group convolutions. As a result, for each residual unit in ResNeXt the pointwise convolutions occupy 93.4% multiplication-adds

if multiple group convolutions stack together, there is one side effect: outputs from a certain channel are only derived from a small fraction of input channels

This property blocks information flow between channel groups and weakens representation

Solution

Channel Shuffle for Group Convolutions
- If we allow group convolution to obtain input data from different groups, the input and output channels will be fully related

Architecture

Result

Scaling Factor means scaling the number of filters in ShuffleNet 1× by s times thus overall complexity will be roughly s^2 times of ShuffleNet 1×