Generative Adversarial Nets

Cristóbal Silva

Why GANs

Simulate Future
Work with missing data
Multi-modal outputs
Realistic generation tasks

Maximum Likelihood

in Deep Learning

sample without density

GANs

Latent code
Asymptotically consistent
No Markov Chains needed
(Subjective) Produces bests samples

Generator vs Discriminator

D(x) = \text{discriminator}

D(x) = \text{discriminator}

G(z) = \text{generator}

G(z) = \text{generator}

z = \text{noise input}

z = \text{noise input}

X = \text{training data}

X = \text{training data}

p_{generator}(x)

p_{generator}(x)

p_{data}(x)

p_{data}(x)

0: real

1: fake

Nash Equilibrium

p_{data}(x) = p_{generator}(x)

p_{data}(x) = p_{generator}(x)

D(x) = \frac{1}{2}

D(x) = \frac{1}{2}

In other words, discriminator can't discriminate

Loss Function

Minimax Game

J^{(D)} = -\frac{1}{2}\mathbb{E}_{\mathbf{x}\sim p_{data}} \log{D(\mathbf{x})} - \frac{1}{2}\mathbb{E}_{\mathbf{z}} \log{(1 - D(G(\mathbf{z}))}

J^{(D)} = -\frac{1}{2}\mathbb{E}_{\mathbf{x}\sim p_{data}} \log{D(\mathbf{x})} - \frac{1}{2}\mathbb{E}_{\mathbf{z}} \log{(1 - D(G(\mathbf{z}))}

J^{(G)} = -J^{(D)}

J^{(G)} = -J^{(D)}

Non-Saturating Game

J^{(D)} = -\frac{1}{2}\mathbb{E}_{\mathbf{x}\sim p_{data}} \log{D(\mathbf{x})} - \frac{1}{2}\mathbb{E}_{\mathbf{z}} \log{(1 - D(G(\mathbf{z}))}

J^{(D)} = -\frac{1}{2}\mathbb{E}_{\mathbf{x}\sim p_{data}} \log{D(\mathbf{x})} - \frac{1}{2}\mathbb{E}_{\mathbf{z}} \log{(1 - D(G(\mathbf{z}))}

J^{(G)} = -\frac{1}{2}\mathbb{E}_{\mathbf{z}} \log{D(G(\mathbf{z}))}

J^{(G)} = -\frac{1}{2}\mathbb{E}_{\mathbf{z}} \log{D(G(\mathbf{z}))}

Vanishing gradient

GANs + PyTorch,

Let's model and sample from a Gaussian!

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

PyTorch is a new DL framework based on dynamic graphs

Comes with high level API for network design

No static-graph required, easier to debug

What we need

Input data

Input noise

our ground truth

our latent code

def real_distribution_sampler(mu, sigma, n):
    samples = np.random.normal(mu, sigma, (1, n))
    return torch.Tensor(samples)

def noise_distribution_sampler(m, n):
    return torch.rand(m, n)

Generator

class Generator(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(1, 100)
        self.fc2 = nn.Linear(100, 100)
        self.fc3 = nn.Linear(100, 1)
    
    def forward(self, x):
        x = F.elu(self.fc1(x))
        x = F.sigmoid(self.fc2(x))
        x = self.fc3(x)
        return x

class Discriminator(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(100, 50)
        self.fc2 = nn.Linear(50, 50)
        self.fc3 = nn.Linear(50, 1)
    
    def forward(self, x):
        x = F.elu(self.fc1(x))
        x = F.elu(self.fc2(x))
        x = F.sigmoid(self.fc3(x))
        return x

Discriminator

100

Training

Parameters

Optimization

n_epochs = 30000

# binary cross entropy loss
criterion = nn.BCELoss()

# discriminator params
D = Discriminator()
d_steps = 1
d_optimizer = optim.SGD(
               D.parameters(), 
               lr=2e-4
              )


# generator params
G = Generator()
g_steps = 1
g_optimizer = optim.SGD(
               G.parameters(), 
               lr=2e-4
              )
batch_size = 100

for epoch in range(n_epochs)
    for i in range(d_steps):
        D.zero_grad()
        # train D on real data
        d_real_data   = Variable(real_distribution_sampler(mu, sigma, 100))
        d_real_output = D(d_real_data)
        d_real_loss   = criterion(d_real_output, Variable(torch.ones(1)))
        d_real_loss.backward()  # compute/store gradients, but don't update

        # train D on fake data
        d_gen_input   = Variable(noise_distribution_sampler(batch_size, 1))
        d_fake_data   = G(d_gen_input).detach()  # important to avoid training G
        d_fake_output = D(d_fake_data.t())
        d_fake_loss   = criterion(d_fake_output, Variable(torch.zeros(1)))
        d_fake_loss.backward()
        
        d_optimizer.step()

    for j in range(g_steps):
        # train G based on D output
        G.zero_grad()
        g_gen_input   = Variable(noise_distribution_sampler(batch_size, 1))
        g_fake_data   = G(g_gen_input)
        g_fake_output = D(g_fake_data.t())
        g_fake_loss   = criterion(g_fake_output, Variable(torch.ones(1)))
        g_fake_loss.backward()
        g_optimizer.step()

Experiment

mean	4.00
var	1.25

SGD

Results 1

mean	4.00
var	1.25

mean	4.21
var	0.03

... the dreaded mode-collapse!

SGD

Results 2

mean	4.00
var	1.25

mean	3.82
var	1.13

SGD with Momentum

Results 3

mean	4.00
var	1.25

mean	4.26
var	1.09

ADAM

GAN Zoo

What we saw was 2 years ago, what now?

Many, many GANs

ose-GAN



enderGAN

AD-GAN

riple-GAN

nrolled GAN

AE-GAN

aterGAN

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

U

V

W

X

Y

Z

daGAN

ayesian GAN

atGAN

CGAN

BGAN

-GAN

eneGAN

yperGAN

GAN





AGAN

cGAN

More than 90 papers in 2017

DCGAN

GANs can encode concepts

pix2pix

DiscoGAN

Cycle-GANs

Problems

Classical GANs generate from input noise. This means you can't select the features you want the sample to have
If you wanted to generate a picture with specific features, there's no way of determining which initial noise values would produce that picture, other than searching over the entire distribution.

Generative Adversarial Nets

By crsilva

Generative Adversarial Nets

Generative Adversarial Nets

Why GANs

Maximum Likelihood

GANs

Generator vs Discriminator

Generator vs Discriminator

Nash Equilibrium

Loss Function

Minimax Game

Non-Saturating Game

GANs + PyTorch,

What we need

Input data

Input noise

Generator

Discriminator

Training

Experiment

Results 1

Results 2

Results 3

GAN Zoo

Many, many GANs

DCGAN

pix2pix

DiscoGAN

Cycle-GANs

Problems

Generative Adversarial Nets

More from crsilva