2021 Sep
Yi-Dar, Tang
Contents
Motivation :
Matching Images' Statistic
Motivation :
Matching Images' Statistic
Naive Approache (Adjust each channel's statistic):
Color Affine Transform
\( f(c) = (\frac{c - \mu_{source}}{\sigma_{\text{source}}}) \cdot \sigma_{\text{target}} + \mu_{\text{target}} \)
Neural Style Transfer
Advertisement
Neural Style Transfer
Use a good feature extration
Find an image \(\overrightarrow x\) has feature map which
Close to \(\overrightarrow p\) content image's deep feature respect to pixel
Close to \(\overrightarrow a\) style image's deep feature respect to statistic
Neural Style Transfer
\(\text{content loss : }L_{\text{content}}^{layer}(y, CI) = \sum_{i,j,c} (F_{layer}(y)(i,j,c)-F_{layer}(CI)(i,j,c))^{2}\)
\(\text{style loss : } L_{\text{style}}^{layer}(y, SI) = \sum_{c_1,c_2}(G_y^{layer}(c_1, c_2)-G_{SI}^{layer}(c_1, c_2))^{2}\)
\(\text{gram matrix : } G^{layer}(I,c1,c2) = \sum_{i,j}F_{layer}(I)(i,j,c1)\times F_{layer}(I)(i,j,c2)\)
Objective
Instance Normalization
Conditional Instance Normalization
Different style with
Note : Since the computation architecture, we do not need bias term in ConvLayer
"""Conditional Instance Normalization in Pytorch"""
import torch
from torch import nn
from functools import partial
class CIN(nn.Module):
def __init__(self, n_channels, n_conditionals):
self.norm = nn.InstanceNorm2d(n_channels)
self.scale = nn.Parameter(
1 + 0.02 * (
torch.randn(n_conditionals, n_channels)
).view(n_conditionals, n_channels, 1, 1)
)
self.bias = nn.Parameter(torch.zeros_like(self.scale))
def forward(self, x, style_idx):
assert style_idx.shape[0] == x.shape[0], "batch size should equal"
x = self.norm(x)
_get = partial(torch.index_select, dim=0, index=style_idx)
x = x * _get(self.scale) + _get(self.bias)
return x
Adaptive Instance Normalization
exploring the structure of a real-time, arbitrary neural artistic stylization network
Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization
StyleGAN Official Branch
Tero Karras
StyleGAN Un-Official Bush
Before StyleGAN
A drawback for traditional generator was pointed out at Sec 3.2 in StyleGAN:
For traditional generator
All variaition should be embedded in Z.
However, there are plenty of natural variation in data
For human face as example, somethings are stochastic : such as the exact placement of hairs, stubble, freckles, or skin pores.
It will decrease the model capacity if we want to embed all this things in the input noise.
StyleGAN
Why named as "Style"GAN
Following slides are focus on the genearator and some tricks, will not discuss about discriminator.
Lots of things are omitted in this slide.
Please read the original papers if you are intersting to this branch.
Generator
Some Notes
Result (Metrics)
(B)~(E) is straight forward
Mixing Regularizaition during Training
Sample 2 Latent \(Z_1, Z_2\)
Get \(W_1, W_2\)
Random sample a layer, use \(W_1 / W_2\) before/after that layer
Result (Metrics)
Mapping
Style Mixing Result
Coarse Styles : pose, hair, face shape
Middle Styles : facial features, eyes
Mind Styles : Color scheme
Noise Result
Coarse Noise : large-scale curling of hair
Fine Noise : finer detail
No Noise: featureless "painterly" look
More About Noise
Style Truncation Result
StyleGAN2
StyleGAN has sevreal characteristic artifact.
In StyleGAN2, they modify the model architecture and training methods to address them.
StyleGAN Generator Revisited
StyleGAN Generator Revisited
Notes In StyleGAN
StyleGAN2
Notes In StyleGAN / StyleGAN2
StyleGAN2 Weight Demodulation
Assume the statastic, do not modify the input statastic explicitly. Direct apply conditional scale to the conv kernel, and normalize the kernel parameter.
Notation :
i/j ↔ in/out channel
k ↔ spatial footprint
StyleGAN2 Architecture Tries
More experiment for shortcut, residual
Perceptual path length(PPL)
Described in StyleGAN 1
\(d\) is VGG16 perceptual distance
slerp is spherical interpolation
lerp is normal linear interpolation
Regularization
StyleGAN2-ADA
Training Generative Adversarial Networks with Limited Data
ADA :adaptive discriminator augmentation
Designing augmentations that do not leak
Aug Prob
G Prob
Real Image Dataset
0.25
0.25
0.25
0.25
Aug G Prob
0.25
0.25
0.25
0.25
a
b
c
d
(a+b+c+d) = 1
0.25(a+b+c+d) = 0.25
Arbritrary a, b, c, d can work
Designing augmentations that do not leak
Aug Prob
G Prob
Real Image Dataset
1-p+.25p
0.25p
0.25p
0.25p
Aug G Prob
a
b
c
d
only $$(a,b,c,d) = (1,0,0,0)$$ can match target distribution for \[\forall p \in [0,1[\]
1-p+.25p
0.25p
0.25p
0.25p
Adaptive Discriminator Augmentation
Rise p with fix step if think D is overfit to training data
The heuristic in the paper is
raise \(p\) while \(r_t >= 0.6\)
decrease \(p\) while \(r_t < 0.6\)
Result
All of them in
2020 June
COOL
Alias-Free GAN
Texture should move while pose is moving
But StyleGAN2 is not
Learning Transferable Visual Models From Natural Language Supervision
CLIP (Contrastive Language-Image Pre-Training)
Pseudocode
Maybe a New Star : Or Patashnik
Stats : 2021/09/09
Before StyleCLIP
There are some "forward" methods can convert real world image to StyleGAN2's latent
StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
Zero data feature transform.
StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
Recall Recall
Objective
Three Approaches In StyleCLIP
Objective
Where 3 mappers are just simple 4 layer dense.
Settings
Notation:
Motivation :
Find the collinearity between
Assume :
Channelwise relevance
\(\Delta i_{c} = E_{s\in S}[I_{CLIP}(G(s+\alpha\Delta s_c)) -I_{CLIP}(G(s+\alpha\Delta s_c))]\)
Inference Time
StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators
(NADA) Non-Adversarial Domain Adaptation
Zero data domain transform.
Latent Mapper Regularizer
Ulike StyleCLIP.
\(M\) is not learning residual here
Layer-Freezing
To overcome mode collapse or overfitting for domain changing task
Find top k layers with change fastest in style \(W+\)
Then optimize the \(\mathcal{L}_{direction}\) for top k evident conv block.
Latent-Mapper ‘Mining’
Some times, the domain trainsfer is not complete
Such as the generated result contain both "dog" and "cat"
To over come this issue, they training a mapper.
Some Off-Topic Things
Paint Transformer: Feed Forward Neural Painting with Stroke Prediction
Paint Transformer: Feed Forward Neural Painting with Stroke Prediction
Dataset : 1 Stroke Image
No Pretrain
Advance ML/DL University Lectures
MLSS 2021 : Machine Learning Summer School
Cost-sensitive Classification: Techniques and Stories | National Taiwan University | MLSS 2021
Fundamentals and Application of Deep Reinforcement Learning | National Taiwan University | MLSS 2021
Machine Learning in Practice | National Taiwan University | MLSS 2021
Deep Learning for Speech Processing | National Taiwan University | MLSS 2021
Machine Learning as a Service: Challenges and Opportunities | National Taiwan University | MLSS 2021
Google Efforts on AI Research and Talent Development | Google Taiwan | MLSS 2021
Interpretable machine learning | Google Brain | MLSS 2021
Machine Learning Summer School | MLSS 2021
Large Scale Learning and Planning in Reinforcement Learning | University of Alberta | MLSS 2021
Neural Architecture Search and AutoML | University of California, Los Angeles | MLSS 2021
Privacy and Machine Learning/Data Analytics | Princeton University | MLSS 2021
Optimal transport | Google Brain | MLSS 2021
Geometric Deep Learning | Imperial College London | MLSS 2021
Trustworthy Machine Learning: Challenges and Opportunities | Panel Discussion | MLSS 2021
Theory of deep learning | IBM Research | Princeton University | MLSS 2021
Transform the Beauty Industry through AI + AR | CTO of Perfect Corp | MLSS 2021
Pre-training for Natural Language Processing | Google Research | MLSS 2021
Bias and Fairness in NLP | University of California, Los Angeles | MLSS 2021
Developing a World-Class AI Facial Recognition Solution | CyberLink Corp | MLSS 2021
Holistic Adversarial Robustness for Deep Learning | IBM Research | MLSS 2021
TinyML and Efficient Deep Learning | Massachusetts Institute of Technology | MLSS 2021
Computer Vision | University of Texas at Austin | MLSS 2021
Neuro-Symbolic Systems and the History of AI | University of Rochester | MLSS 2021
Continual Visual Learning | Inria, France | MLSS 2021
Overview of learning quantum states | IBM Research | MLSS 2021
An introduction to Statistical Learning Theory and PAC-Bayes Analysis | College London | MLSS 2021
Style Transfer
StyleGAN