Arvin Liu @ 北區AIA
表定時間 | 大致內容 |
---|---|
14:00 ~ 15:00 | GAN Review, DCGAN, GAN problems (1) |
15:00 ~ 15:20 | break |
15:20 ~ 16:20 | Least Square GAN, IS/FID, WGAN |
16:20 ~ 16:40 | break |
16:40 ~ 17:40 | WGAN-GP, conditional GAN, ACGAN, cycle GAN, etc. |
17:40 ~ 18:00 | Q&A |
image resource: 李宏毅老師的投影片
能力值 -G-> 角色
image resource: 李宏毅老師的投影片
Given skill points to generate something.
Skill points
image resource: 李宏毅老師的投影片, anime image powered by http://mattya.github.io/chainer-DCGAN/
題外話:為什麼我們可以用低維度的vector表示高維度的圖片?
題外話:為什麼我們可以用低維度的vector表示高維度的圖片?
圖片 dimension >> 可能會出現的圖片 dimension
image resource: 李宏毅老師的投影片
image resource: 李宏毅老師的投影片
image resource: 李宏毅老師的投影片
image resource: 李宏毅老師的投影片
Why both?
http://kvfrans.com/variational-autoencoders-explained/
1st epoch
9th epoch
Original
image resource: https://cdn-images-1.medium.com/max/1600/1*ZEvDcg1LP7xvrTSHt0B5-Q@2x.png
Generator
image resource: https://cdn-images-1.medium.com/max/1600/1*ZEvDcg1LP7xvrTSHt0B5-Q@2x.png
image resource: https://www.cnblogs.com/huangshiyu13/p/6209016.html
image resource: https://cdn-images-1.medium.com/max/1600/1*ZEvDcg1LP7xvrTSHt0B5-Q@2x.png
image resource: 李宏毅老師的投影片
DCGAN(ICLR'16) : https://arxiv.org/abs/1511.06434
BN-> ReLU
tanh
DCGAN deconv: kernel size (5,5) , stride (2,2)
BN + lReLU
lReLU
score
* resolution is not high due to image size(64x64).
DCGAN WITHOUT ANY TIPS
* resolution = 128x128.
I use tensorflow to train my GAN.
picture resource: https://www.tensorflow.org/
picture resource: my code displayed on sublime text3
Convolution 128
f : (5,5) , s : (2,2)
Convolution 256
f : (5,5) , s : (2,2)
Convolution 512
f : (5,5) , s : (2,2)
Convolution 1024
f : (5,5) , s : (2,2)
Dense 1
Score
64x64x3 image
picture resource: my code displayed on sublime text3
Conv^T 512
f : (5,5) , s : (2,2)
Conv^T 256
f : (5,5) , s : (2,2)
Conv^T 128
f : (5,5) , s : (2,2)
Conv^T 3
f : (5,5) , s : (2,2)
Dense 2^14
64x64x3 image
latent 100~N(0,1)
picture resource: my code displayed on sublime text3,
note that the latent in original paper is sampled from uniform distribution.
picture resource: my code displayed on sublime text3
picture resource: my code displayed on sublime text3
fake_score = discriminator(fake_data)
real_score = discriminator(real_data)
picture resource: 宏毅老師投影片
picture resource: 宏毅老師投影片
picture resource: 宏毅老師投影片
學生被洗腦,老師痛罵學生。
老師被洗腦,學生騙過老師。
picture resource: 宏毅老師投影片
Reminder:
variable_scope(reuse = tf.AUTO_REUSE)
這種Loss function稍後會講
picture resource: my code displayed on sublime text3
1step = 5 Discriminator iters + 1 Generator iter.
picture resource: GAN result from Arvin Liu (myself).
picture resource: GAN result by Arvin Liu (myself).
picture resource: GAN result by Arvin Liu (myself).
picture resource: GAN result by Arvin Liu (myself).
picture resource: GAN result by Arvin Liu (myself).
Student
Teacher
img created
by myself.
img created
by myself.
image resource: 李宏毅老師的投影片
Distribution(Mode) Collapse!
image resource: wiki
Student
Teacher
Question : Generate a code for sorting
到底是誰的錯?
image resource: 李宏毅老師的投影片
Discri-
minator
score
All possible data
image resource https://www.offconvex.org/2016/03/22/saddlepoints/
Early stopping +
validation
HUMAN!
G的參數
D的參數
G的loss
Save your model everytime,
save your whole life time.
Diminished Gradient
Mode Collapse or
Gradient Explode
Non-convergence
Solve Diminished Gradient
https://arxiv.org/abs/1611.04076
image resource: 李宏毅老師的投影片
(with sigmoid)
(without sigmoid)
Discriminator score
score: 20
Sigmoid後分數
Sigmoid前分數
score: -20
score: 40
score: -40
Sigmoid 不會受到這個問題困擾。
score: 20
Linear後分數
Linear前分數
score: -20
score: 40
score: -40
對於Linear,只要參數乘倍loss就會下降!
image resource: 李宏毅老師的投影片
Discriminator score
image resource: 李宏毅老師的投影片
Discriminator score
image resource: 李宏毅老師的投影片
Discriminator score
picture resource: my code displayed on sublime text3
Solve Mode Collapse
你潮棒der。
Mode Collapse
Mode Collapse
你潮棒der。
Nice Choice!
img resource:https://9gag.com/gag/aeexzeO/drake-meme
??????
Solve Non-convergence
做不好的時候,告訴Discriminator loss該長怎樣
CVPR '18: PairedCycleGAN: Asymmetric Style Transfer for Applying and Removing Makeup
CVPR '18: PairedCycleGAN: Asymmetric Style Transfer for Applying and Removing Makeup
告訴Discriminator:
你覺得什麼地方是重要的。(疊loss)
WGAN : https://arxiv.org/pdf/1701.07875.pdf
if D is 1-lipschitz -> must be smooth
1-lipschitz : |f(x)-f(y)| <= |x-y|
image resource: 李宏毅老師的投影片
They don't move.
Why?
Gradient too steep!
(slope is too steep.)
Discriminator score
All possible data
Diminished Gradient
Explode Score
Solve Diminished Gradient & Mode Collapse
Improved WGAN(NIPS '17): https://arxiv.org/abs/1704.00028
(because of 1-lipschitz function)
image resource: 李宏毅老師的投影片
Q: How to smooth the gradient?
A: Regularization
(give it penalty)
Too Steep.
More Flat.
image resource: 李宏毅老師的投影片
Discriminator score
All possible data
Target is between the real data and fake data.
B
-
α=0.1
α=0.9
B
-
α=0.1
α=0.9
Note that 2-norm,
image resource: 李宏毅老師的投影片
Personal comment: gradient不能太高因為Loss,
gradient不能太低因為 diminished gradient,而1剛剛好。
(1是paper實驗出的,剛好巧合的符合1-lip的1。)
Improved WGAN(NIPS '17): https://arxiv.org/abs/1704.00028
* lambda = 10 in paper
(Not mentioned but very powerful)
* resolution is not high due to image size(64x64).
D:G = 4:1 , 5500 iterations
picture resource: GAN result by Arvin Liu (myself).
If real data & G distribution....
A
B
C
A
B
C
A
B
C
A
B
C
A
B
C
Easy to minimize G_loss for G's data
Mode Collapse!
Discriminator do well on class B,C
A
B
C
Easy to minimize G_loss for G's data
Mode Collapse!
A
B
C
Mode Collapse!
A
B
C
-
-
-
-
Easy to minimize G_loss for G's data
A
B
C
-
-
-
-
Easy to minimize G_loss for G's data
A
B
C
A
B
C
Easy to minimize G_loss for G's data
DCGAN structure, image size 128x128
picture resource: GAN result by Arvin Liu (myself).
picture resource: GAN result by Arvin Liu (myself).
picture resource: GAN result by Arvin Liu (myself).
picture resource: GAN result by Arvin Liu (myself).
Solve Mode Collapse
Ian Goodfellow : https://arxiv.org/pdf/1606.03498.pdf
img resource: https://arxiv.org/pdf/1606.03498.pdf
F: raw
feature from each data
T: Projection
c: calculate difference
o: calculate diversity
picture resource: my code displayed on sublime text3
picture resource: GAN result by Arvin Liu (myself).
picture resource: GAN result by Arvin Liu (myself).
https://arxiv.org/abs/1411.1784
image resource: 李宏毅老師的投影片
你開心就好:)
例如 classify 一個分數,real/fake 一個分數?
https://arxiv.org/abs/1610.09585
image resource: https://www.cnblogs.com/punkcure/p/7873566.html
D output C/S
Thus, Having Classification & Synthesis(合成) eror.
In this task, I change some loss function.
(For G, +log(z) -> -log(1-z))
Ls : loss in original GAN.
Lc : binary cross entropy
(tag loss, not softmax.)
例如 畫風 / 聲音變聲等等...
data
A
fake data B
A -> B
Generator
data
B
A -> B
Discriminator
paired data A
paired fake dataB
A -> B
Generator
paired data B
paired data A
paired fake dataB
A -> B
Generator
paired data B
A -> B
Discriminator
condition : embbeded A
data
A
paired fake dataB
A -> B
Generator
data
B
B
Discriminator
data
A
fake data B
A -> B
Generator
data
B
B
Discriminator
G會不會只是亂生一個B呢?
由之前的Mode Collapse,我們就可以知道這件事可能發生。
加一個Loss / 直接做或pretrain model都可以
data
A
fake data B
A -> B
Generator
data
B
B
Discriminator
fake data A
B -> A
Generator
A
Discriminator
data
A
fake data B
A -> B
Generator
fake data A
B -> A
Generator
L1_loss
這個L1_loss,我們稱為
Cycle Consistency Loss
data
A
fake
data B
A -> B
Generator
fake
data A
B -> A
Generator
data
B
fake
data A
A -> B
Generator
fake
data B
A ->B
Generator
Cycle Consistency
Cycle Consistency
B
Discriminator
A
Discriminator
image resource: 李宏毅老師的投影片
Magic
Cycle Consistency
image resource: 李宏毅老師的投影片
Cycle Consistency 的目的是讓Y像X。但如果G能把資料藏起來,
那麼Cycle consistency loss就沒有意義。
https://affinelayer.com/pixsrv/
Embedding
Generator
U-net
Memes Generation: https://arxiv.org/pdf/1806.04510.pdf
Condition: Encoder(Picture)
Use language model or something as Discriminator
https://arxiv.org/pdf/1711.08447.pdf
https://arxiv.org/pdf/1711.08447.pdf
(Part from original model)
https://github.com/Aixile/chainer-cyclegan
https://arxiv.org/pdf/1801.07892.pdf
https://arxiv.org/pdf/1801.07892.pdf
看看就好,拿來作弊用的
Use STRONGER D to filter G's data.
"Stronger" means train D lonely in the end.
When you think G is good enough, lower the learning rate, brighter image you get.
In classification problem...
image resource: https://chtseng.wordpress.com/2017/11/11/data-augmentation-%E8%B3%87%E6%96%99%E5%A2%9E%E5%BC%B7/
class A
class B
class C
real data
GAN ganerator
simple augmentation
???
???
???
[Lab talk]
Dataset A | Dataset B | is_pair? |
---|---|---|
exist | exist | Yes |
not exist | exist | No |
exist | not exist | No |
not exist | not exist | ??? |
https://arxiv.org/pdf/1611.07004v1.pdf
i.e.: Maxpooling, ReLU(but G can use it.)
Maybe ReLU is suitible for "edge"?
ICLR'2017 (rejected)
https://openreview.net/forum?id=SypU81Ole
https://en.wikipedia.org/wiki/Linear_interpolation
D:G = 4:1 , 5500 iterations
picture resource: GAN result by Arvin Liu (myself).
D:G = 4:1 , 5500 iterations
picture resource: GAN result by Arvin Liu (myself).
image resource: 李宏毅老師的投影片
picture resource: GAN result by Arvin Liu (myself).
conditional GAN
ACGAN (Auxiliary Classifier GAN)
Cycle GAN
Code:
goo.gl/yRvUk1