From Style Transfer to StyleGAN-NADA

2021 Sep

Yi-Dar, Tang

Contents

  • Recall : StyleTransfer / CIN / AdaIN / FiLM
  • StyleGAN (2018)
  • StyleGAN2 (2019)
  • StyleGAN2-ADA (2020)
  • Alias-Free GAN (2021)
  • CLIP (2021)
  • Before StyleCLIP
  • StyleCLIP (2021)
  • StyleGAN-NADA(2021)
  • Some off-topic things
  • Reference and Cited number(at 2021 Sep)

Motivation :

Matching Images' Statistic

Motivation :

Matching Images' Statistic

Naive Approache (Adjust each channel's statistic):

Color Affine Transform 

\( f(c) = (\frac{c - \mu_{source}}{\sigma_{\text{source}}}) \cdot \sigma_{\text{target}} + \mu_{\text{target}} \)

Neural Style Transfer

Advertisement

Neural Style Transfer

Use a good feature extration

Find an image \(\overrightarrow x\) has feature map which

Close to \(\overrightarrow p\) content image's deep feature respect to pixel

Close to \(\overrightarrow a\) style image's deep feature respect to statistic

Neural Style Transfer

\(\text{content loss : }L_{\text{content}}^{layer}(y, CI) = \sum_{i,j,c} (F_{layer}(y)(i,j,c)-F_{layer}(CI)(i,j,c))^{2}\)

\(\text{style loss : } L_{\text{style}}^{layer}(y, SI) = \sum_{c_1,c_2}(G_y^{layer}(c_1, c_2)-G_{SI}^{layer}(c_1, c_2))^{2}\)

\(\text{gram matrix : } G^{layer}(I,c1,c2) = \sum_{i,j}F_{layer}(I)(i,j,c1)\times F_{layer}(I)(i,j,c2)\)

Objective

  1. Image-Optimisation-Based
    \(argmin_x \mathcal{L}(x,CI,SI)\)
  2. Per-Style-Per-Model Neural Methods
    \(argmin_{M_{SI}} \mathbb{E}_{CI}[\mathcal{L}(M_{SI}(CI),CI,SI)]\)
  3.  Multiple-Style-Per-Model Neural Methods
    \(argmin_{M} \mathbb{E}_{CI, SI}[\mathcal{L}(M(CI,SI),CI,SI)]\)

Instance Normalization

Conditional Instance Normalization

Different style with

  • Same : Convolution Parameter
  • Different : Scale & Bias
    (1 style cost about 0.2% compare to ConvParams)

Note : Since the computation architecture, we do not need bias term in ConvLayer

"""Conditional Instance Normalization in Pytorch"""
import torch
from torch import nn
from functools import partial

class CIN(nn.Module):
  def __init__(self, n_channels, n_conditionals):
    self.norm  = nn.InstanceNorm2d(n_channels)
    self.scale = nn.Parameter(
      1 + 0.02 * (
        torch.randn(n_conditionals, n_channels)
      ).view(n_conditionals, n_channels, 1, 1)
    )
    self.bias  = nn.Parameter(torch.zeros_like(self.scale))
  
  def forward(self, x, style_idx):
    assert style_idx.shape[0] == x.shape[0], "batch size should equal"
    
    x = self.norm(x)
    _get = partial(torch.index_select, dim=0, index=style_idx)
    x = x * _get(self.scale) + _get(self.bias)
	return x

Adaptive Instance Normalization

exploring the structure of a real-time, arbitrary neural artistic stylization network

Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization

StyleGAN Official Branch

Tero Karras

StyleGAN Un-Official Bush

Before StyleGAN

A drawback for traditional generator was pointed out at Sec 3.2 in StyleGAN:

 

For traditional generator

All variaition should be embedded in Z.

However, there are plenty of natural variation in data

 

For human face as example, somethings are stochastic : such as the exact placement of hairs, stubble, freckles, or skin pores.

 

It will decrease the model capacity if we want to embed all this things in the input noise.

StyleGAN

Why named as "Style"GAN

Following slides are focus on the genearator and some tricks, will not discuss about discriminator.

Lots of things are omitted in this slide.

Please read the original papers if you are intersting to this branch.

Generator

Some Notes

  • All 3 x 3 conv with same channels
  • Last layer is 1x1 conv with 3 channels output
  • Input for \(g\) is a learnable constant tensor
  • Bilinear upsample
  • Latent \(Z\) is normlized to \(|Z|_2 = 1\)
  • Gaussian (0,1) noise is applied before each AdaIN layer
  • AdaIN's Scale&Bais are same for each layer for default setting
  • AdaIN just adjust the global statistics, have no localize information in it.

Result (Metrics)

(B)~(E) is straight forward

Mixing Regularizaition during Training

Sample 2 Latent \(Z_1, Z_2\)

Get \(W_1, W_2\)

Random sample a layer, use \(W_1 / W_2\) before/after that layer

Result (Metrics)

Mapping

Style Mixing Result

Coarse Styles : pose, hair, face shape

Middle Styles : facial features, eyes

Mind Styles : Color scheme

Noise Result

Coarse Noise : large-scale curling of hair

Fine Noise : finer detail

No Noise: featureless "painterly" look

More About Noise

Style Truncation Result

StyleGAN2

StyleGAN has sevreal characteristic artifact.

In StyleGAN2, they modify the model architecture and  training methods to address them.

StyleGAN Generator Revisited

StyleGAN Generator Revisited

Notes In StyleGAN

  • The magnitudes of original feature map will be drop in AdaIn operator
  • The scale of bias&noise will be affected by the current style scale & conv layer
  • AdaIN's standard deviation depends on input explicitly.
  • Just a feedforward model(No shortcut, No residual)

StyleGAN2

Notes In StyleGAN / StyleGAN2

  • The magnitudes of original feature map will be drop in AdaIn operator
  • Not adjust mean in their try
  • The scale of bias&noise will be affected by the current style scale & conv layer
  • Move them out after norm std
  • AdaIN's standard deviation depends on input explicitly.
  • Assume the statastic, do not modify the input statastic explicitly. Direct apply conditional scale to the conv kernel, and normalize the kernel parameter. (d)
  • Just a feedforward model(No shortcut, No residual)
  • More experiment for shortcut, residual
    (not in this slide)

StyleGAN2 Weight Demodulation

Assume the statastic, do not modify the input statastic explicitly. Direct apply conditional scale to the conv kernel, and normalize the kernel parameter.

Notation :
i/j ↔ in/out channel

k ↔ spatial footprint

StyleGAN2 Architecture Tries

More experiment for shortcut, residual

Perceptual path length(PPL)

Described in StyleGAN 1

\(d\) is VGG16 perceptual distance

slerp is spherical interpolation

lerp is normal linear interpolation

Regularization

  1. Lazy Regularization
  2. Path length regularization

StyleGAN2-ADA

Training Generative Adversarial Networks with Limited Data

ADA :adaptive discriminator augmentation

Designing augmentations that do not leak

Aug Prob

G Prob

Real Image Dataset

0.25

0.25

0.25

0.25

Aug G Prob

0.25

0.25

0.25

0.25

a

b

c

d

(a+b+c+d) = 1

0.25(a+b+c+d) = 0.25

Arbritrary a, b, c, d can work

Designing augmentations that do not leak

Aug Prob

G Prob

Real Image Dataset

1-p+.25p

0.25p

0.25p

0.25p

Aug G Prob

a

b

c

d

only $$(a,b,c,d) = (1,0,0,0)$$ can match target distribution for \[\forall p \in [0,1[\]

1-p+.25p

0.25p

0.25p

0.25p

Adaptive Discriminator Augmentation

Rise p with fix step if think D is overfit to training data

The heuristic in the paper is

raise \(p\) while \(r_t >= 0.6\)

decrease \(p\) while \(r_t < 0.6\)

Result

All of them in

2020 June

COOL

Alias-Free GAN

Texture should move while pose is moving

But StyleGAN2 is not

Learning Transferable Visual Models From Natural Language Supervision

CLIP (Contrastive Language-Image Pre-Training)

Pseudocode

Maybe a New Star : Or Patashnik

Stats : 2021/09/09

Before StyleCLIP

There are some "forward" methods can convert real world image to StyleGAN2's latent

StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery

Zero data feature transform.

StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery

Recall Recall

Objective

  1. Image-Optimisation-Based
    \(argmin_x \mathcal{L}(x,CI,SI)\)
  2. Per-Style-Per-Model Neural Methods
    \(argmin_{M_{SI}} \mathbb{E}_{CI}[\mathcal{L}(M_{SI}(CI),CI,SI)]\)
  3.  Multiple-Style-Per-Model Neural Methods
    \(argmin_{M} \mathbb{E}_{CI, SI}[\mathcal{L}(M(CI,SI),CI,SI)]\)

Three Approaches In StyleCLIP

Objective

  1. Image-Optimisation-Based
    \(argmin_x \mathcal{L}(x,CI,SI)\)
  2. Per-Style-Per-Model Neural Methods
    \(argmin_{M_{SI}} \mathbb{E}_{CI}[\mathcal{L}(M_{SI}(CI),CI,SI)]\)
  3.  Multiple-Style-Per-Model Neural Methods
    \(argmin_{M} \mathbb{E}_{CI, SI}[\mathcal{L}(M(CI,SI),CI,SI)]\)
  1. Latent Optimization
  2. Latent Mapper
  3. Global Directions

Where 3 mappers are just simple 4 layer dense.

Settings

  • Default : \(\lambda_{L2}:0.8, \lambda_{ID}:0.1\)
  • Trump : \(\lambda_{L2}:2, \lambda_{ID}:0\)

Notation:

  • \(s\) : Style Code
  • \(i\) : CLIP Image Encode
  • \(t\) : CLIP Text Encode

Motivation :

Find the collinearity between

  • Style Code Direction \(\Delta s\)
  • Text Encode Direction \(\Delta t\)

Assume :

  • Image Encode Direction \(\Delta i\) = \(\Delta t\)

Channelwise relevance

\(\Delta i_{c} = E_{s\in S}[I_{CLIP}(G(s+\alpha\Delta s_c)) -I_{CLIP}(G(s+\alpha\Delta s_c))]\)

Inference Time

StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators

(NADA)  Non-Adversarial Domain Adaptation

Zero data domain transform.

Latent Mapper Regularizer

Ulike StyleCLIP.

\(M\) is not learning residual here

Layer-Freezing

To overcome mode collapse or overfitting for domain changing task

Find top k layers with change fastest in style \(W+\)

Then optimize the \(\mathcal{L}_{direction}\) for top k evident conv block.

 Latent-Mapper ‘Mining’

Some times, the domain trainsfer is not complete

Such as the generated result contain both "dog" and "cat"

To over come this issue, they training a mapper.

Some Off-Topic Things

Paint Transformer: Feed Forward Neural Painting with Stroke Prediction

Paint Transformer: Feed Forward Neural Painting with Stroke Prediction

Dataset : 1 Stroke Image
No Pretrain

Advance ML/DL University Lectures

MLSS 2021 : Machine Learning Summer School

Cost-sensitive Classification: Techniques and Stories | National Taiwan University | MLSS 2021
Fundamentals and Application of Deep Reinforcement Learning | National Taiwan University | MLSS 2021
Machine Learning in Practice | National Taiwan University | MLSS 2021
Deep Learning for Speech Processing | National Taiwan University | MLSS 2021
Machine Learning as a Service: Challenges and Opportunities | National Taiwan University ​| MLSS 2021
Google Efforts on AI Research and Talent Development | Google Taiwan | MLSS 2021
Interpretable machine learning | Google Brain | MLSS 2021
Machine Learning Summer School | MLSS 2021
Large Scale Learning and Planning in Reinforcement Learning | University of Alberta | MLSS 2021
Neural Architecture Search and AutoML | University of California, Los Angeles | MLSS 2021
Privacy and Machine Learning/Data Analytics | Princeton University | MLSS 2021
Optimal transport | Google Brain | MLSS 2021
Geometric Deep Learning | Imperial College London | MLSS 2021
Trustworthy Machine Learning: Challenges and Opportunities | Panel Discussion | MLSS 2021
Theory of deep learning | IBM Research | Princeton University | MLSS 2021
Transform the Beauty Industry through AI + AR | CTO of Perfect Corp ​| MLSS 2021
Pre-training for Natural Language Processing | Google Research | MLSS 2021
Bias and Fairness in NLP | University of California, Los Angeles | MLSS 2021
Developing a World-Class AI Facial Recognition Solution | CyberLink Corp | MLSS 2021
Holistic Adversarial Robustness for Deep Learning | IBM Research | MLSS 2021
TinyML and Efficient Deep Learning | Massachusetts Institute of Technology | MLSS 2021
Computer Vision | University of Texas at Austin | MLSS 2021
Neuro-Symbolic Systems and the History of AI | University of Rochester | MLSS 2021
Continual Visual Learning | Inria, France | MLSS 2021
Overview of learning quantum states | IBM Research | MLSS 2021
An introduction to Statistical Learning Theory and PAC-Bayes Analysis | College London | MLSS 2021

Reference and Cited number (at 2021 Sep)

  • (1843)
    Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge. "A neural algorithm of artistic style." arXiv preprint arXiv:1508.06576 (2015).
  • (317)
    Jing, Yongcheng, et al. "Neural Style Transfer: A Review." arXiv preprint arXiv:1705.04058 (2017).
  • (1747)
    Ulyanov, Dmitry, Andrea Vedaldi, and Victor Lempitsky. "Instance normalization: The missing ingredient for fast stylization." arXiv preprint arXiv:1607.08022 (2016).
  • (680)
    Dumoulin, Vincent, Jonathon Shlens, and Manjunath Kudlur. "A learned representation for artistic style."
    arXiv preprint arXiv:1610.07629 (2016).
  • (1513)
    Huang, Xun, and Serge Belongie. "Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization." arXiv preprint arXiv:1703.06868 (2017).
  • (166)
    Ghiasi, Golnaz, et al. "Exploring the structure of a real-time, arbitrary neural artistic stylization network." arXiv preprint arXiv:1705.06830 (2017).
  • (45)
    Dumoulin, Vincent, et al. "Feature-wise transformations."
    Distill 3.7 (2018): e11.

Style Transfer

StyleGAN

  • (2199)
    Karras, Tero, Samuli Laine, and Timo Aila. "A style-based generator architecture for generative adversarial networks." arXiv preprint arXiv:1812.04948 (2018).
  • (869)
    Karras, Tero, et al. "Analyzing and Improving the Image Quality of StyleGAN." arXiv e-prints (2019): arXiv-1912.
  • (158)
    Karras, Tero, et al. "Training generative adversarial networks with limited data." arXiv preprint arXiv:2006.06676 (2020).
  • (2)
    Karras, Tero, et al. "Alias-Free Generative Adversarial Networks." arXiv preprint arXiv:2106.12423 (2021).
  • (159)
    Radford, Alec, et al. "Learning transferable visual models from natural language supervision." arXiv preprint arXiv:2103.00020 (2021).
  • (13)
    Patashnik, Or, et al. "Styleclip: Text-driven manipulation of stylegan imagery." arXiv preprint arXiv:2103.17249 (2021).
  • (0)
    Gal, Rinon, et al. "StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators." arXiv preprint arXiv:2108.00946 (2021).

From Style Transfer to StyleGAN NADA

By sin_dar_soup

From Style Transfer to StyleGAN NADA

  • 461