From
Style Transfer
to
Text-Driven Image Manipulation
湯沂達
台灣人工智慧學校技術處
2021/12/10
Replay
講者


湯沂達
Email
changethewhat@gmail.com / yidar@aiacademy.tw

Did you see these?
(A) https://nightcafe.studio/blogs/blog/top-20-ai-generated-artworks
(B) https://twitter.com/CitizenPlain/status/1316760510709338112/photo/1
(C)https://www.ettoday.net/news/20210616/2007703.htm
(D) https://github.com/orpatashnik/StyleCLIP


<= only with
text & input



A
C
B
D
Content
- Style Transfer
- GAN & StyleGAN
- Image Manipulation with StyleGAN
- Text Driven Image Manipulation/Generation
- Related Topics
- Applications/Resource List
- Paper List
This Talk
Spirit of some famous methods
Prerequisite
- Mean / Std
- Convolution & Activation
- Loss Function
- Gradient Descent
This Talk
Spirit of some famous methods
Warning
The following pages have some math equations.
However, I will explain them from the idea of the algorithms, not from the equations.
Style Transfer
Before Style Transfer

How to summarize texture?
https://paperswithcode.com/dataset/psu-near-regular-texture-database
Before Style Transfer

How to summarize texture?
Define some handmade feature representation
like color, gradient, frequency...
Then use statistics
Image => Feature Extract => Summarize => Distance
https://paperswithcode.com/dataset/psu-near-regular-texture-database

How to summarize texture?
Image
Feature
Extract
Summarize
Distance
Distance measure for vector/distribution
Manjunath , B. S., & Ma, W. Y. (1996). Texture features for browsing and retrieval of image data. pattern analysis and machine intelligence , 18 (8), 837IEEE Transactions on 52 842.
Ojala, Timo, Matti Pietikainen, and Topi Maenpaa. "Multiresolution gray-scale and rotation invariant texture classification with local binary patterns." IEEE Transactions on pattern analysis and machine intelligence 24.7 (2002): 971-987.

How to summarize texture?

Gabor Filter Bank
(H,W,1)→(H,W,f⋅θ)
Image
Feature
Extract
Summarize
Distance
use μ,σ
(H,W,f⋅θ)→(2⋅f⋅θ)
Distance measure for vector/distribution
Manjunath , B. S., & Ma, W. Y. (1996). Texture features for browsing and retrieval of image data. pattern analysis and machine intelligence , 18 (8), 837IEEE Transactions on 52 842.
Ojala, Timo, Matti Pietikainen, and Topi Maenpaa. "Multiresolution gray-scale and rotation invariant texture classification with local binary patterns." IEEE Transactions on pattern analysis and machine intelligence 24.7 (2002): 971-987.

How to summarize texture?

Gabor Filter Bank
(H,W,1)→(H,W,f⋅θ)
Image
Feature
Extract
Summarize
Distance

use μ,σ
(H,W,f⋅θ)→(2⋅f⋅θ)
Use Histogram
(H,W,1)∈{0,1,...,9}→
Distance measure for vector/distribution
Manjunath , B. S., & Ma, W. Y. (1996). Texture features for browsing and retrieval of image data. pattern analysis and machine intelligence , 18 (8), 837IEEE Transactions on 52 842.
Ojala, Timo, Matti Pietikainen, and Topi Maenpaa. "Multiresolution gray-scale and rotation invariant texture classification with local binary patterns." IEEE Transactions on pattern analysis and machine intelligence 24.7 (2002): 971-987.
9

Rotation Invariant Local Binary Pattern
(H,W,1)∈{0,1,...,255}→(H,W,1)∈{0,1,...,9}

How to summarize texture?

Gabor Filter Bank
(H,W,1)→(H,W,f⋅θ)
Image
Feature
Extract
Summarize
Distance

use μ,σ
(H,W,f⋅θ)→(2⋅f⋅θ)
Use Histogram
(H,W,1)∈{0,1,...,9}→
Distance measure for vector/distribution
Manjunath , B. S., & Ma, W. Y. (1996). Texture features for browsing and retrieval of image data. pattern analysis and machine intelligence , 18 (8), 837IEEE Transactions on 52 842.
Ojala, Timo, Matti Pietikainen, and Topi Maenpaa. "Multiresolution gray-scale and rotation invariant texture classification with local binary patterns." IEEE Transactions on pattern analysis and machine intelligence 24.7 (2002): 971-987.
9

<
≥
Thresholding
Rotation Invariant Local Binary Pattern
(H,W,1)∈{0,1,...,255}→(H,W,1)∈{0,1,...,9}
11000000

How to summarize texture?

Gabor Filter Bank
(H,W,1)→(H,W,f⋅θ)
Rotation Invariant Local Binary Pattern
(H,W,1)∈{0,1,...,255}→(H,W,1)∈{0,1,...,9}
Image
Feature
Extract
Summarize
Distance

use μ,σ
(H,W,f⋅θ)→(2⋅f⋅θ)
Use Histogram
(H,W,1)∈{0,1,...,9}→
Distance measure for vector/distribution
Manjunath , B. S., & Ma, W. Y. (1996). Texture features for browsing and retrieval of image data. pattern analysis and machine intelligence , 18 (8), 837IEEE Transactions on 52 842.
Ojala, Timo, Matti Pietikainen, and Topi Maenpaa. "Multiresolution gray-scale and rotation invariant texture classification with local binary patterns." IEEE Transactions on pattern analysis and machine intelligence 24.7 (2002): 971-987.
9

rotation invariant
<
≥
Thresholding
How to summarize data?
Data
Feature
Extract
Describe
Distance
If we have good feature extractor...


Distance measure for vector/distribution
Describe data w/ or w/o statistic...
A Cute Dog Staring You
Feature
Extract
Distance
If we have good feature extractor...
- Use Other Task's Pretrained Weight
- Create It By Yourself
Distance measure for vector/distribution
Describe
Describe data w/ or w/o statistic...


A Cute Dog Staring You
How to summarize data?
Data
Style Transfer
Lin, Tianwei, et al. "Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer." arXiv preprint arXiv:2104.05376 (2021).

Content
Style
Styllized
Objective
Find a stylized image, which has
- Content image's content
- Style image's style
Style Transfer
Jing, Yongcheng, et al. "Neural Style Transfer: A Review." arXiv preprint arXiv:1705.04058 (2017).

Style Transfer
Jing, Yongcheng, et al. "Neural Style Transfer: A Review." arXiv preprint arXiv:1705.04058 (2017).

Style Transfer
Jing, Yongcheng, et al. "Neural Style Transfer: A Review." arXiv preprint arXiv:1705.04058 (2017).

Style Transfer
Jing, Yongcheng, et al. "Neural Style Transfer: A Review." arXiv preprint arXiv:1705.04058 (2017).
- Image Optimization (Inference ≡ train:minutes)
Find an image - Model Optimization (Inference:real time; Train:hours)
Find a model can transfer image- Per-Style-Per-Model (PSPM)
Model contain 1 style - Multiple-Style-Per-Model (MSPM)
Model contain n style - Arbitrary-Style-Per-Model (ASPM)
Model contain any style
- Per-Style-Per-Model (PSPM)
Style Transfer
Jing, Yongcheng, et al. "Neural Style Transfer: A Review." arXiv preprint arXiv:1705.04058 (2017).
- Image Optimization (Inference ≡ train:minutes)
Find an image - Model Optimization (Inference:real time; Train:hours)
Find a model can transfer image- Per-Style-Per-Model (PSPM)
Model contain 1 style - Multiple-Style-Per-Model (MSPM)
Model contain n style - Arbitrary-Style-Per-Model (ASPM)
Model contain any style
- Per-Style-Per-Model (PSPM)
Style Transfer
Jing, Yongcheng, et al. "Neural Style Transfer: A Review." arXiv preprint arXiv:1705.04058 (2017).
- Image Optimization (Inference ≡ train:minutes)
Find an image - Model Optimization (Inference:real time; Train:hours)
Find a model can transfer image- Per-Style-Per-Model (PSPM)
1 style - Multiple-Style-Per-Model (MSPM)
n styles - Arbitrary-Style-Per-Model (ASPM)
any style
- Per-Style-Per-Model (PSPM)
Image style transfer using convolutional neural networks
Notes
- Arxiv : 1508.06576
- First paper for "Neural" Style Transfer
- Get the result by optimize the image
- Plenty of later papers use their loss function
- Cost minutes to generate an image
- Cited by 3267 at 2021 Nov
Feature
Extract
Distance
If we have good feature extractor...
- Use Other Task's Pretrained Weight
- Create It By Yourself
Distance measure for vector/distribution
Describe
Describe data w/ or w/o statistic...


A Cute Dog Staring You
Data


VGG
(A Pretrained Model)
Lcontent : Feature tensor close to content image's feature tensor
Lstyle : Stat(feature) close to style image's stat(feature)
Ltotal=Lcontent+λLstyle
I^


VGG
(A Pretrained Model)
Lcontent=( - )2
Lcontent : Feature tensor close to content image's feature tensor
Lstyle : Stat(feature) close to style image's stat(feature)
Ltotal=Lcontent+λLstyle
I^


I^
VGG
(A Pretrained Model)
Lstyle=(G( )-G( ))2
Lcontent=( - )2
Lcontent : Feature tensor close to content image's feature tensor
Lstyle : Stat(feature) close to style image's stat(feature)
Ltotal=Lcontent+λLstyle


I^
VGG
(A Pretrained Model)
Lstyle=(G( )-G( ))2
Lcontent=( - )2
Lcontent : Feature tensor close to content image's feature tensor
Lstyle : Stat(feature) close to style image's stat(feature)
Ltotal=Lcontent+λLstyle
result ←argminI^Ltotal(I^)


I^
VGG
(A Pretrained Model)
Lstyle=(G( )-G( ))2
Lcontent=( - )2
Lcontent : Feature tensor close to content image's feature tensor
Lstyle : Stat(feature) close to style image's stat(feature)
Ltotal=Lcontent+λLstyle
result ←argminI^Ltotal(I^)
I^=I^−α∂I^∂Ltotal


I^
VGG
(A Pretrained Model)
Lstyle=(G( )-G( ))2
Lcontent=( - )2
Lcontent : Feature tensor close to content image's feature tensor
Lstyle : Stat(feature) close to style image's stat(feature)
Ltotal=Lcontent+λLstyle
F∈RC×X×Y,G(F)∈RC×C
G(F)c1,c2=X⋅Y1x,y∑[Fc1,x,y⋅Fc2,x,y]
G : gram matrix
result ←argminI^Ltotal(I^)
I^=I^−α∂I^∂Ltotal
Get more abstract result while use deeper layer for content loss


Their Result
Perceptual Losses for Real-Time Style Transfer and Super-Resolution
Johnson, Justin, Alexandre Alahi, and Li Fei-Fei. “Perceptual Losses for Real-Time Style Transfer and Super-Resolution.” arXiv preprint arXiv:1603.08155 (2016).
Notes
- Arxiv : 1603.08155
- Get the result by optimize a model
- Thier research has 2 branch :
- style transfer
- super resolution
- Plenty of later papers use the term "Perceptual"
- Per-Style-Per-Model(PSPM)
- Real time while inference
- Cited by 5962 at 2021 Nov


I^
Perceptual


I^
prev : argminI^Ltotal(I^)
Perceptual



fW(I)
Model fW
prev : argminI^Ltotal(I^)
this: argminfW∑I∈datasetLtotal(fW(I))
A learned representation for artistic style
Notes
- Arxiv : 1610.07629
- Get the result by optimizing a model
- Multiple-Style-Per-Model (MSPM)
- Use conditional instance normalization(CIN) for multiple style transfer
- The standard setting contain 32 styles, each style contain about 0.2% total parameters.
- Real time while inference
- Cited by 727 at 2021 Nov
Before A Learned Representation For Artistic Style

Color Distrbution Matching
Source
Stat(R)
Stat(G)
Stat(B)
Target
Stat(R)
Stat(G)
Stat(B)
Before A Learned Representation For Artistic Style

Color Distrbution Matching
Before A Learned Representation For Artistic Style

Color Distrbution Matching
Target
Stat(R)
Stat(G)
Stat(B)
Normalize
(μ=0,σ=1)
(μ=0,σ=1)
(μ=0,σ=1)
Source
Stat(R)
Stat(G)
Stat(B)
A Learned Representation For Artistic Style

Each Style Use a (γ,β) pair


Target
Stat(R)
Stat(G)
Stat(B)
Normalize
(μ=0,σ=1)
(μ=0,σ=1)
(μ=0,σ=1)
Source
Stat(R)
Stat(G)
Stat(B)

fW(I)
Model fW
prev
Conv
n×
Act

Conv
n×
Act

fW(I)
Model fW
prev
Conv
n×
Act
This

S2
S1

Interpolate
S =
αS1+(1−α)S2

S2
S1

Interpolate
S3
S4
Exploring the structure of a real-time, arbitrary neural artistic stylization network
Notes
- Arxiv : 1705.06830
- Get the result by optimizing a model
- Arbitrary-Style-Per-Model (ASPM)
- It generalized CIN for adaptive to arbitrary Style
- Real time to generate an image with gpu
- Cited by 180 at 2021 Nov
Prev Work




Style Prediction Network
This Work







Architecture

Small Recap
Papers:
- 1508.06576 Image Optimization
- 1603.08155 Per-Style-Per-Model (PSPM)
- 1610.07629 Multiple-Style-Per-Model (MSPM)
- 1705.06830 Arbitrary-Style-Per-Model (ASPM)
About half year a big improve
Small Recap
Papers:
- 1508.06576 Image Optimization
- 1603.08155 Per-Style-Per-Model (PSPM)
- 1610.07629 Multiple-Style-Per-Model (MSPM)
- 1705.06830 Arbitrary-Style-Per-Model (ASPM)
About half year a big improve
Not Enough?
(Methods up to March 2018, Cited by 335 at Nov 2021)
Jing, Yongcheng, et al. "Neural Style Transfer: A Review." arXiv preprint arXiv:1705.04058 (2017).
My Medium
類神經影像藝術風格轉換系列筆記-基礎
GAN & StyleGAN
StyleGAN

Why Named StyleGAN?
Karras, Tero, Samuli Laine, and Timo Aila. "A style-based generator architecture for generative adversarial networks." arXiv preprint arXiv:1812.04948 (2018).
GAN
Data
Feature
Extract
Distance
If we have good feature extractor...
- Use Other Task's Pretrained Weight
- Create It By Yourself
Distance measure for vector/distribution
Describe
Describe data w/ or w/o statistic...


A Man With Curly Hair
GAN & StyleGAN

GAN & StyleGAN

Notes :
- Input for conv is a constant tensor
- Apply AdaIN
- Add random noise while inference & training
(stochastic items: hair, freckles, skin pores)
StyleGAN

w∈W
StyleMixing
Interpolate
1:10~2:07

StyleGAN
Cost
StyleGAN

Solve Artifact (00:30~1:30)
StyleGAN2
Solve Interpoloate Artifact
StyleGAN3
Tero Karras


Un-Official Forest

Image Manipulation with StyleGAN
Image Manipulation with StyleGAN
Methods shamelessly taken from this video
Image Manipulation with StyleGAN
Warning: We skip a lot
Warning: We skip a lot
Warning: We skip a lot
Data
Feature
Extract
Distance
If we have good feature extractor...
- Use Other Task's Pretrained Weight
- Create It By Yourself
Distance measure for vector/distribution
Describe
Describe data w/ or w/o statistic...


A Man With Curly Hair
Image Manipulation with StyleGAN
Modify
pretrained weight / hidden output
with smart measure
Specific
Image Manipulation with StyleGAN
Contents
- GAN Dissection: Visualizing and Understanding Generative Adversarial Networks
Add/Remove semantic of GAN's output - Semantic Photo Manipulation with a Generative Image Prior
Edit Your Own Photo - Rewriting a Deep Generative Model
Edit The Generative Model(like roof=> tree)
Image Manipulation with StyleGAN
Contents
- GAN Dissection: Visualizing and Understanding Generative Adversarial Networks
Add/Remove semantic of GAN's output - Semantic Photo Manipulation with a Generative Image Prior
Edit Your Own Photo - Rewriting a Deep Generative Model
Edit The Generative Model (like roof=> tree)
GAN Dissection: Visualizing and Understanding Generative Adversarial Networks


Image generated by GAN
Output by zeroing some activation

Step 1 : Which hidden channels have high correlation to segmentation map?

Step 1 : Which hidden channels have high correlation to segmentation map?
Step 2 : Edit these channels (to constant, to 0)

Notes
It need segment model or manual label
Step 1 : Which hidden channels have high correlation to segmentation map?
Step 2 : Edit these channels (to constant, to 0)
Official GIFs



Semantic Photo Manipulation with a Generative Image Prior


Find best matching latents in GAN.
Bad result :(
Find best matching latents in GAN
Allow slight weight modification
Nice :)


Use previous work's editing skill
00:27~00:55



W : weight of layer L
k : normal input at layer L
k∗ : selected input at layer L
v∗: desired output for k∗ at layer L
normal output should not change
change source to target
go "Example Results"
Text Driven Image Manipulation/Generation
- OpenAI : CLIP
- StyleCLIP
- CLIPDraw & StyleCLIPDraw
- My Method : StyleTransferCLIP
- OpenAI : Dall E
Contents
CLIP
Connecting Text and Images
Radford, Alec, et al. "Learning transferable visual models from natural language supervision." arXiv preprint arXiv:2103.00020 (2021).
dog
cat
hen
bee
Traditional Classification

CLIP
Connecting Text and Images
Radford, Alec, et al. "Learning transferable visual models from natural language supervision." arXiv preprint arXiv:2103.00020 (2021).
CLIP (Contrastive Language–Image Pre-training)
dog
cat
hen
bee
Traditional Classification



# https://github.com/openai/CLIP#usage import torch import clip from PIL import Image device = "cuda" if torch.cuda.is_available() else "cpu" model, preprocess = clip.load("ViT-B/32", device=device) image = preprocess(Image.open("CLIP.png")).unsqueeze(0).to(device) text = clip.tokenize(["a diagram", "a dog", "a cat"]).to(device) with torch.no_grad(): image_features = model.encode_image(image) text_features = model.encode_text(text) logits_per_image, logits_per_text = model(image, text) probs = logits_per_image.softmax(dim=-1).cpu().numpy() print("Label probs:", probs) # prints: [[0.9927937 0.00421068 0.00299572]]

StyleCLIP


StyleCLIPDraw
Image Manipulation/Generation with
1 Image, 1 Text
Use CLIP Encoder


−CLIPI(img)⋅CLIPT(text)=LCLIP
"...."

2021 Sep 09

2021 Nov 12
StyleCLIP Author : A Newstar

2021 Dec 08
StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery. (2021 Mar)
Patashnik, Or, et al. "StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery." arXiv e-prints (2021): arXiv-2103.
In Style Transfer/In StyleCLIP
- Image Optimization
Latent Optimization - Model Optimization
- Per-Style-Per-Model
Latent Mapper - Multiple-Style-Per-Model
No - Arbitrary-Style-Per-Model
Global Directions
- Per-Style-Per-Model
In Style Transfer/In StyleCLIP
- Image Optimization
Latent Optimization - Model Optimization
- Per-Style-Per-Model
Latent Mapper (skip) - Multiple-Style-Per-Model
No - Arbitrary-Style-Per-Model
Global Directions (skip)
- Per-Style-Per-Model
In Style Transfer/In StyleCLIP
- Image Optimization
Latent Optimization - Model Optimization
- Per-Style-Per-Model
Latent Mapper (skip) - Multiple-Style-Per-Model
No - Arbitrary-Style-Per-Model
Global Directions (skip)
- Per-Style-Per-Model


StyleCLIP(GAN Inv with e4e + official mapper)
w
"Curly Hair"

Genertate
Latent
StyleGan
(G)
ws
StyleGan

Get reconstructed latent : ws
Latent optimization
Face Regonition


Same Person?
LID
Same Description?
LCLIP
w∗=argminwLCLIP+λL2∣∣w−ws∣∣2+λIDLID
G(w)
CLIPDraw (2021 Jun) & StyleCLIPDraw (2021 Nov)

Ltotal= Lcontent+βLstyle
CLIPDraw : β=0
StyleCLIPDraw : β>0
Schaldenbrand, Peter, Zhixuan Liu, and Jean Oh. "StyleCLIPDraw: Coupling Content and Style in Text-to-Drawing Synthesis." arXiv preprint arXiv:2111.03133 (2021).
CLIPDraw (2021 Jun) & StyleCLIPDraw (2021 Nov)
In Style Transfer/CLIPDraw
- Image Optimization
Line Parameter Optimization - Model Optimization
- Per-Style-Per-Model
No - Multiple-Style-Per-Model
No - Arbitrary-Style-Per-Model
No
- Per-Style-Per-Model
Before CLIPDraw
Gradient decent from loss to curve's parameters is possible, i.e.
∂Pi∂Lcan be computed
Parameter for control points :
position, rgba, thickness


StyleCLIPDraw
If no Augmentation, the result is bad.



CLIPDraw Results

The Eiffel Tower



StyleCLIPDraw Results



My Method : StyleTransferCLIP
edit style embedding
Einitial=SP(S)
with LCLIP
Input Image
(C)

Style Image
(S)
Output Image
NST(C, Einitial)
CLIP Result
Next pages



E
argminE(LCLIP(NST(C,E),Text))
My Experiment on Neural Style Transfer
My Experiment on Neural Style Transfer with Augmentation
You can play my method with replicate.ai

# https://github.com/huggingface/tokenizers output = tokenizer.encode("Hello, y'all! How are you 😁 ?") print(output.tokens) # ["Hello", ",", "y", "'", "all", "!", "How", "are", "you", "[UNK]", "?"] # string => tokens # token => idx => embedding
Tokenize text
# https://github.com/huggingface/tokenizers output = tokenizer.encode("Hello, y'all! How are you 😁 ?") print(output.tokens) # ["Hello", ",", "y", "'", "all", "!", "How", "are", "you", "[UNK]", "?"] # string => tokens # token => idx => embedding
Tokenize text
Autoregressive Model (Next token prediction)
Pθ(x)=Πi=1nPθ(xi∣x1,x2,…,xi−1)
# https://github.com/huggingface/tokenizers output = tokenizer.encode("Hello, y'all! How are you 😁 ?") print(output.tokens) # ["Hello", ",", "y", "'", "all", "!", "How", "are", "you", "[UNK]", "?"] # string => tokens # token => idx => embedding
Tokenize text
Autoregressive Model (Next token prediction)
Pθ(x)=Πi=1nPθ(xi∣x1,x2,…,xi−1)
Pθ("sunny"∣"The weather is")
Pθ("cookie"∣"The weather is")
Pθ("furry"∣"The weather is")

VQ-VAE can tokenize image to n×n tokens
Image Tokenization

VQ-VAE can tokenize image to n×n tokens
Autoregressive Model (Next token prediction)
Pθ(x)=Πi=1nPθ(xi∣x1,x2,…,xi−1)
Image Tokenization

Autoregressive Model (Next token prediction)
"a dog is watching you"
xt1,xt2,…,xtn
xi1,xi2,…,xim
Pθ(x)=Πi=1nPθ(xi∣x1,x2,…,xi−1)

Autoregressive Model (Next token prediction)
Pθ(xt,xi)=Πp=1mPθ(xip∣xt1,xt2,…,xtn,xi1,xi2,…,xip−1)
Dall E
"a dog is watching you"
xt1,xt2,…,xtn
xi1,xi2,…,xim
Pθ(x)=Πi=1nPθ(xi∣x1,x2,…,xi−1)
Core Concept
- Image to image token with VQ-VAE
- Text to text token
- Concat them and make this become an next token prediction problem.
Sad Things
- 12-billion parameter
(≈ 2264 × efficient-B0) - 250 milllion (image, text) pairs
(≈18 × ImageNet)
Core Concept
- Image to image token with VQ-VAE
- Text to text token
- Concat them and make this become an next token prediction problem.
An Explain
Not Enough?
Paper & Code

Not Enough?
Paper & Code

Takeaway
- Style Transfer
- Loss function
- Image Optimization
- Model Optimization
- CIN
- StyleGAN
- Borrow from style transfer
- Add noise
- Official Branch StyleGAN, StyleGAN2, StyleGAN-ADA, StyleGAN3
- Image Manipulation with StyleGAN
- Modify weight / activation with smart way
- Text Driven Image Manipulation/Genearation
- CLIP Method & CLIP Loss
- Dall E : Text & Image Next Token Prediction
Related Topics


2021 Jan


2021 Jan


Edit model without additional image

StyleGAN-NADA (2021 Aug)
Next work of StyleCLIP


2021 Dec
It can train a forward model in about 1 min
An old method : StarGAN(2017)


Choi, Yunjey, et al. "StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation." arXiv e-prints (2017): arXiv-1711.
VQ-GAN (2020 Dec)
a parallel method to DALL E
2021 Nov
2020 Apr




Novel view synthesis
Semantic photo manipulation (This Slide)
Facial and Body Reenactment
Relighting
Free-Viewpoint Video
Photo-realistic avatars for AR/VR

2021 Dec
2021 Dec
Applications
Resource List









Paper List
Style Transfer
-
Manjunath , B. S., & Ma, W. Y. (1996). Texture features for browsing and retrieval of image data. pattern analysis and machine intelligence , 18 (8), 837IEEE Transactions on 52 842.
-
Ojala, Timo, Matti Pietikainen, and Topi Maenpaa. "Multiresolution gray-scale and rotation invariant texture classification with local binary patterns." IEEE Transactions on pattern analysis and machine intelligence 24.7 (2002): 971-987.
- Lin, Tianwei, et al. "Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer." arXiv preprint arXiv:2104.05376 (2021).
- Jing, Yongcheng, et al. "Neural Style Transfer: A Review." _arXiv preprint arXiv:1705.04058_ (2017).
- Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge. "Image style transfer using convolutional neural networks." _Proceedings of the IEEE conference on computer vision and pattern recognition_. 2016.
- Johnson, Justin, Alexandre Alahi, and Li Fei-Fei. "Perceptual Losses for Real-Time Style Transfer and Super-Resolution." arXiv preprint arXiv:1603.08155 (2016).
- Dumoulin, Vincent, Jonathon Shlens, and Manjunath Kudlur. "A learned representation for artistic style." _arXiv preprint arXiv:1610.07629_ (2016).
- Ghiasi, Golnaz, et al. "Exploring the structure of a real-time, arbitrary neural artistic stylization network." _arXiv preprint arXiv:1705.06830_ (2017).
GAN & StyleGAN & StyleGAN Manipulation
- Karras, Tero, Samuli Laine, and Timo Aila. "A style-based generator architecture for generative adversarial networks." _arXiv preprint arXiv:1812.04948_ (2018).
- Karras, Tero, et al. "Analyzing and Improving the Image Quality of StyleGAN." arXiv preprint arXiv:1912.04958 (2019).
- Karras, Tero, et al. "Alias-Free Generative Adversarial Networks." _arXiv preprint arXiv:2106.12423_ (2021).
-
Bau, David, et al. "Gan dissection: Visualizing and understanding generative adversarial networks." _arXiv preprint arXiv:1811.10597_ (2018).
-
Bau, David, et al. "Semantic photo manipulation with a generative image prior." _arXiv preprint arXiv:2005.07727_ (2020).
-
Bau, David, et al. "Rewriting a deep generative model." _European Conference on Computer Vision_. Springer, Cham, 2020.
Text Driven Image Manipulation/Genearation
-
Radford, Alec, et al. "Learning transferable visual models from natural language supervision." _arXiv preprint arXiv:2103.00020_ (2021).
-
Patashnik, Or, et al. "StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery." arXiv preprint arXiv:2103.17249 (2021).
-
Frans, Kevin, L. B. Soros, and Olaf Witkowski. "Clipdraw: Exploring text-to-drawing synthesis through language-image encoders." arXiv preprint arXiv:2106.14843 (2021).
-
Schaldenbrand, Peter, Zhixuan Liu, and Jean Oh. "StyleCLIPDraw: Coupling Content and Style in Text-to-Drawing Synthesis." arXiv preprint arXiv:2111.03133 (2021).
-
Ramesh, Aditya, et al. "Zero-shot text-to-image generation." _arXiv preprint arXiv:2102.12092_ (2021).
Thanks
If have any feedback, please contact me
changethewhat+NST@gmail.com
yidar+NST@aiacademy.tw
From Style Transfer to Text-Driven Image Manipulation
By sin_dar_soup
From Style Transfer to Text-Driven Image Manipulation
- 585