Yu-Hsiu Huang
Regular Block
Residual Block
Andrew Ng
He et al. 2015
D2L
non-ideal
ideal
Is x fitted well by the previous layer?
Yes
almost zero weighting on f(x)-x
(preserve accuracy)
No
nonzero weighting on f(x)-x
(optimize residual)
output from previous layer
When we goes deeper in ResNet, the architecture ensures that the fitting at least covers the prediction from the shallower networks.
ResNet
Plain
ResNet
Plain
18 layers
34 layers
To reduce number of channels
D2L
He et al. 2015
25 refereed papers for ResNet or ResNet-like networks
Huang et al. 2017
credit: Sik-Ho Tsang
The concatenation feature will increase the size of feature map, so in reality, a transition lay is added between two dense block to reduce the number of channels.
transition layers
Huang et al. 2017
Sik-Ho Tsang, Huang et al 2017
each individual layer in the dense block received additional supervision
DSN improve accuracy
Lee et al. 2014
DenseNet achieve better/similar accuracy than other nets
1 refereed paper
Use DenseNet to pre-train model on ImageNet and later reuse this model as galaxy classifier.
Issue of DenseNet: