Zhen Huang, Xu Shen, Jun Xing, Tongliang Liu, Xinmei Tian, Houqiang Li, Bing Deng, Jianqiang Huang, Xian-Sheng Hua
University of Science and Technology of China, Alibaba Group, University of Southern California, University of Sydney
Teacher Net
Dataset
T's Result
Student Net
S's Result
Pretrained
GT
hard loss
soft loss
Usually smaller or equal to Teacher Net
Layer-Wise Relevance Propagation (LRP): An visualization of the attention map in the input image.
(Inheritance and Exploation KD Framework)
Teacher Net
Student Net
Student Net
(inheritance part)
SHOULD
similar
Student Net
(exploration part)
SHOULD NOT
similar
Teacher Net (U)
Dataset
Student Net (U)
Regressor
T's feat
S's feat
S's feat transform
L2
Loss
Dataset
Teacher Net (U)
T's feat
Encoder T
Compact
T's feat
Decoder T
IE-AT | ||
IE-FT | ||
IE-OD |
exploration loss
Method Name
Inheritance loss
*means attention map of features
Inh part: tail like crocodile.
Exp part: ears are also important.
Inh part: head like seal.
Exp part: turtle shell should also concerned.
Add gaussian noise at the middle layer
and observe the loss changes.
CKA (Center Kernel Alignment, ICML'19) similarity: A method to measure the similarity between feature maps.
Dataset: CIFAR-10, Criterion: Error rate
Dataset: Imagenet, Resnet34 -> 18, Criterion: Error rate
Dataset: PASCAL VOC (2007, for object detection),
Resnet50 -> 18, Criterion: MAP
(Inheritance and Exploation DML Framework)
Student Net 1
Dataset
S1's Result
Student Net 2
GT
hard
loss
soft loss
Deep Mutual Learning
S2's Result
GT
hard
loss
hard
loss
Student Network 2
1