Temp Space

Teacher Net

Dataset

T's Result

Student Net

S's Result

Pretrained

Knowledge Distillation

Teacher Net

Dataset

T's Result

Student Net

S's Result

Pretrained

Ground Truth

Loss_{student} = CE(y, logits_{s}) + Div(logits_{t}||logits_{s})

x' = \arg \max_{x} \, L(x, y; \theta) \\ GD: x' = x + \epsilon \nabla_x L(x, y; \theta)

\theta' = \arg \min_{\theta} \, L(x, y; \theta) \\ GD: \theta' = \theta - \eta \nabla _{\theta}L(x,y;\theta)

x' = \arg \min_{x} \, L(x, y, \theta) + R_{prior}(x) \\ GD: x' = x + \epsilon \nabla_x L(x,y;\theta)

Teacher Net

T's Result

Student Net

S's Result

Pretrained

Knowledge Distillation

???

Fake Dataset

R_{l1}(x) = |x[0:] - x[1:]| \\ R_{l2}(x) = \sum_{p_i \in x}\sum_{p_j \in \text{neighbor}(x)} |p_i - p_j|

x_{0, 0}

x_{0, 1}

overlapped image

R_{l1}(x) = ||x_{0, 0} - x_{0, 1}||_1 \\ R_{l2}(x) = ||x_{0, 0} - x_{0, 1}||_2 \\ R_{prior} = \alpha_{l1} R_{l1}(x) + \alpha_{l2}R_{l2}(x)

R_{feat}(x) = \sum_{i\in L} (||\mu(x) - \mu_{bn}||_2 + ||\sigma^2(x) - \sigma^2_{bn}||_2 ) \\ R_{DI} = R_{prior} + \alpha_{feat} R_{feat}

R_{compete}(x) = 1 - div_{JS}(p_{t}(x), p_{s}(x))

ResNet101

ResNet18

Pretrained

KD

Large Dataset
(ImageNet)

?

ResNet101

ResNet18

Pretrained

KD

Large Dataset
(ImageNet)

Lost / Privacy

L_{CL} = \\ KL(p_o(x'), p_k(x')) + \\ L_{CE}(y_k, p_k(x_k)) + \\ KL(p_o(x_k | y\in C_o), p_k(x_k | y\in C_o))

Teacher Net

Dataset

T1's Result

T2's Result

Pretrained

Constrained, i.e. bn must be same

Teacher Net'

Untrained

Student Net

S's Result

Knowledge Distillation

Teacher Net

T's Result

Student Net

S's Result

Pretrained

Feature Distillation

Fake Dataset

Generator

Feature Distillation

Logits Distillation

Generator

image x

image x'

Conv 1

Classification

Question

Conv 2

Channel Selecting

(Choose what network want to know)

Target: How to design loss, let model know about residual?

Generator

image x'

Conv 1

Feature1

Query1

Conv 2

Feature2

Query2

Classification 1

Classification 2

By add. Dense(Feature 1 + Feature 2)

Generator

image x'

Conv 1

Query1

Conv 2

Query2

By add. Dense(Feature 1 + Feature 2)

image x1

image x2

...

Batch Discrimination

Generator

image x'

Classification 1

Question 2

Feature lv 1

Classification 2

Feature lv 2

Identity

Question 3

Classification 3

Feature lv 3

Generator

image x'

Classification 1

Question 2

Feature lv 1

Classification 2

Feature lv 2

Identity

Question 3

Classification 3

Feature lv 3

Triangle : Dataset A / Circle : Dataset B

(Fill) Blue & Orange : Task A

(Border) Red & Green : Task B

Classifier A: Blue / Orange Triangle

Distribution
Dataset A
(Triangle)

Distribution
Dataset B

(Circle)

Classifier B:
Red / Green Circle

Task A

Task B

model_A(X_A)

model_A(X_B)

?

Data A

Data B

model_B(X_A)

model_B(X_B)

Class 1, 5-shots

Feature
Extractor

Latent

Mean

Proto

Support

Feature
Extractor

Latent1

Latent2

Latent3

Latent4

Latent5

Query

LatentQ

Attention
Network

Feature
Extractor

attention score

Prototype Latent

Mini-ImageNet

Cifar-100

Feature
Extractor

(10 epochs)

New

Classifier

New

Classifier

(10 epochs)

(50 epochs)

2

0

3

5

1

4

2

0

3

1

4

DaNN Structure

Feature
Extractor

Source Domain

Target Domain
(No label)

Label
Predictor

[0.1, 0.7, -0.1, ...]

[-1.5, 7.2, 3.1, ...]

Source Feature
Domain

Target Feature
Domain

Logit

Label

Domain
Classifier

Source Score
(want -> 1)

Target
Score
(want -> 0)

Step 1
(train G)

Step 2
(train D)

?

What is GRL?

Gradient Reversal Layer

input

Output

...

Loss

GRL

input

Output

...

Loss

gradient

-1 *gradient

Want loss to be small

Want loss to be big

Want loss to be small

input

GRL

Feature
Extractor

Domain
Classifier

Output

Loss

Want to classify
source or target

Want to obfuscate
source or target

grad

-grad

DaNN - GRL?

Feature
Extractor

Source Domain

Target Domain

Label
Predictor

[0.1, 0.7, -0.1, ...]

[-1.5, 7.2, 3.1, ...]

Source Feature
Domain

Target Feature
Domain

Logit

Label

GRL + Domain
Classifier

Source Score
(want -> 1)

Target
Score
(want -> 0)

Gradient Reversal Layer

Only 1 Step!

image

text

Secret

Encoder

Decoder

image with secret

add noise
& printed

Secret

Real world image

image

Identification

image

text

Secret

Encoder

image with secret

image

Decoder

image

Encoder

\left[0.1, 0.2, 0.9, ... \right]

low-dimension
latent

Decoder

reconstruct image

image

text

Secret

Encoder

image with secret

image

Decoder

image

encoder

\left[0.1, 0.2, 0.9, ... \right]

low-dimension
latent

Decoder

decoder

reconstruct image

image

Encoder

Decoder

reconstructed image

style-transfer

image

Cycle Consistency (L1-loss)

image

text

Secret

Encoder

image with secret

image

Decoder

text

Secret

Consistency (L1-loss)

Cross Entropy

Discriminator

T/F

image

text

Secret

Encoder

residual image

image

image with secret

image

text

Secret

Encoder

residual image

image

image with secret

image

text

Secret

Encoder

Detection

image with secret

add noise
& printed

Real world image

image

Identification

Decoder

image

text

Secret

Encoder

Decoder

image with secret

image

text

Secret

Encoder

image '

Iteratively do 10 times

image

Strategy

Network

LSB

DCT

DWT

IWT

\left[0.1, 0.2, 0.9, ... \right]

environment
hyper-parmeters

image with secret

image

secret text 1

OWO

Encoder

image with secret

image

Decoder

Decoder A

secret text2

UMU

secret text 1

OWO

secret text2

UMU

secret text 1

OWO

secret text2

UMU

secret text 1

OWO

secret text2

UMU

Decoder B

image

Strategy

Network

LSB

DCT

DWT

IWT

\left[0.1, 0.2, 0.9, ... \right]

environment
hyper-parmeters

image with secret

Find a strategy to less affect the original image.

image

secrets

OWO

Encoder A

RGB Channel

Encoder C

UMU

Residuals of
RGB Channel

image with secret

QAQ

Encoder B

image

text

Secret

Encoder

image

image with secret

Online System

Query

Download

image with secret

secrets

OWO

UMU

QAQ

image with secret

OWO

UMU

QAQ

Secret Message Pairs

Decoder

Optimize

Encoder

Low-complexity
Decoder

image

watermark

Ah-Neng®

image

watermark

Ah-Neng®

Copyright OK!

Low-complexity
Fake Decoder

watermark

asda???ajta

Encoder

High-complexity
Decoder

image

watermark

Ah-Neng®

image

watermark

Ah-Neng®

?????

High-complexity
Fake Decoder

watermark

127®

High-complexity
Decoder

watermark

Ah-Neng®

by Ah-Neng®

Init
Conv

Conv1

Conv2

Conv3

Conv4

Dec

downsample

ResNet BackBone

feature-wise

Distillation

Dec

Bottle
neck

Map

Dec

Bottle
neck

Map

Dec

Bottle
neck

Map

Depth (L1)
Sem. Seg (softmax)

Init
Conv

Conv1

Conv2

Conv3

Conv4

Dec

downsample

ResNet BackBone

Conv2'

Map

Dec

Conv3'

Conv4'

Conv3'

Conv4'

Map

Dec

Map

Dec

0號

1號

2號

3號

4號

1. 幫傳給貓貓
"HI!"

2. HI!

3.

4.

Temp Space

DaNN Structure

What is GRL?

DaNN - GRL?

Temp Space

More from Arvin Liu