ResNet & DenseNet
Yu-Hsiu Huang
Quick Facts
- Residual Network (ResNet)
- first proposed by He et al. 2015
- won the ImageNet Large Scale Visual Recognition Challenge in 2015
- Dense Convolutional Network (DenseNet)
- originally from Huang et al. 2017
- Best Paper Award in The Conference on Computer Vision and Pattern Recognition 2017
ResNet

Regular Block
Residual Block
Deeper (more layers), better?
Err...

Andrew Ng
Problem of Plain Networks

He et al. 2015
- Deeper networks have higher training errors.
- He et al. do not explain why this degradation problem happens, but it should relate to optimization.
He et al's Idea

D2L
- Ideally, adding more layers to a network should get closer to the true answer
- More-layer networks should at least cover the prediction of the less-layer networks.
non-ideal
ideal
Learn Residual Mapping

Is x fitted well by the previous layer?
Yes
almost zero weighting on f(x)-x
(preserve accuracy)
No
nonzero weighting on f(x)-x
(optimize residual)
output from previous layer
Learn Residual Mapping
When we goes deeper in ResNet, the architecture ensures that the fitting at least covers the prediction from the shallower networks.

ResNet
Plain
Accuracy Comparison

ResNet
Plain

18 layers
34 layers
Usual design: 1x1 Conv layer
To reduce number of channels

D2L
Accuracy Comparison on ImageNet
He et al. 2015

ResNet Application in Astronomy


25 refereed papers for ResNet or ResNet-like networks

DenseNet

Huang et al. 2017
- Inherit the idea of creating short paths from earlier layers to later layers
- To ensure maximum information flow between layers, DenseNet connect all layers.
Key feature: Concatenation
- In ResNet, element-wise addition is used when implementing identity mapping.

- In DenseNet, concatenation is used. Each layer receives a collective knowledge from all preceding layers.

credit: Sik-Ho Tsang
Architecture:
Dense Block + Transition Layer

The concatenation feature will increase the size of feature map, so in reality, a transition lay is added between two dense block to reduce the number of channels.
transition layers
Huang et al. 2017
Advantage of DenseNet
- Increase parameter efficiency: need less parameters
- e.g. At l-th layer, ResNet has C inputs and C outputs; DenseNet has lxk inputs and k outputs. Usually, k<<C.


Sik-Ho Tsang, Huang et al 2017
Advantage of DenseNet
- Increase parameter efficiency: need less parameters
- Implicit deep supervision by the short connections
- Deep supervision can improve the accuracy (DSN; Lee et al. 2014)

each individual layer in the dense block received additional supervision

DSN improve accuracy
Lee et al. 2014
Accuracy Comparison

DenseNet achieve better/similar accuracy than other nets
DenseNet Application in Astronomy
1 refereed paper

Use DenseNet to pre-train model on ImageNet and later reuse this model as galaxy classifier.
Issue of DenseNet:
- high GPU memory
- at least 1000 samples per class
Summary
- ResNet
- A network solves the degradation problem by having residual blocks (identity connections).
- It has large influence on later deep neural
- It is commonly used for image classification nowadays even in astronomy.
- DenseNet
- A network extends the logic of short connections. It addresses the degradation issue as ResNet.
- It achieve higher performance in terms of accuracy by less computation. However, it requires high GPU memory.
ResNet and DenseNet
By Yvonne Huang
ResNet and DenseNet
- 361