Intro to Model Compression

4/30 by Arvin Liu

Mixed it !

Knowledge Distillation

Distill Logits - Deep Mutual Learning (2/3)

Logits

y_{1,t}

y_{1,t}

y_{2,t}

y_{2,t}

x

x

Networks2

Networks1

Step 1: Update Net1

y

y

True Label

CE

D_{KL}(y_{2,t}||y_{1,t})

D_{KL}(y_{2,t}||y_{1,t})

Loss_1 = D_{KL}(y_{2,t}||y_{1,t})+ \text{CrossEntropy}(y,y_{1,t})

Loss_1 = D_{KL}(y_{2,t}||y_{1,t})+ \text{CrossEntropy}(y,y_{1,t})

Distill Logits - Deep Mutual Learning (3/3)

Logits

y_{1,t}

y_{1,t}

y_{2,t}

y_{2,t}

x

x

Networks2

Networks1

Step 2: Update Net2

y

y

True Label

CE

D_{KL}(y_{1,t}||y_{2,t})

D_{KL}(y_{1,t}||y_{2,t})

Loss_2 = D_{KL}(y_{1,t}||y_{2,t})+ \text{CrossEntropy}(y,y_{2,t})

Loss_2 = D_{KL}(y_{1,t}||y_{2,t})+ \text{CrossEntropy}(y,y_{2,t})

More Details: https://slides.com/arvinliu/kd_mutual

Distill Relation - Similarity-Preserving KD (1/3)

Mnist Model

circle

vertical line

0

9

1

[1, 0]

[1, 1]

[0, 1]


1	0.7	0
0.7	1	0.7
0	0.7	1

Cosine Similarity

\text{img}_0

\text{img}_0

\text{img}_1

\text{img}_1

\text{img}_2

\text{img}_2

\text{img}_1

\text{img}_1

\text{img}_2

\text{img}_2

\text{img}_0

\text{img}_0

Distill Relation - Similarity-Preserving KD (2/3)

Mnist TeacherNet

circle

vertical line

Relational Information

on Features


1	0.7	0
0.7	1	0.7
0	0.7	1

Teacher's Cosine Similarity Table

\text{img}_0

\text{img}_0

\text{img}_1

\text{img}_1

\text{img}_2

\text{img}_2

\text{img}_1

\text{img}_1

\text{img}_2

\text{img}_2

\text{img}_0

\text{img}_0

Mnist StudentNet

?

Relational Information

on Features

Student's Cosine Similarity Table

imitate to learn relationship between images

No hard copy to features!

Eval by weight - sum of L1 norm

Layer want to pruned

\sum ^{4}_{j=1}||k_{1,j}||

\sum ^{4}_{j=1}||k_{1,j}||

Change to L2 norm?

Feature map 0~3

Conv

Conv Weight:

(3, 4, k, k)

calculate sum of L1-norm

\sum ^{4}_{j=1}||k_{2,j}||

\sum ^{4}_{j=1}||k_{2,j}||

\sum ^{4}_{j=1}||k_{3,j}||

\sum ^{4}_{j=1}||k_{3,j}||

Prune

Intro to Model Compression

Intro to Model Compression

More from Arvin Liu