UNSUPERVISED REPRESENTATION LEARNING
WITH DEEP CONVOLUTIONAL
GENERATIVE ADVERSARIAL NETWORKS

Gideon Rosenthal

Shay Ben-Sasson

Examples of DCGAN

Examples of DCGAN

Examples of DCGAN

Examples of DCGAN

https://github.com/mattya/chainer-DCGAN

Examples of DCGAN extensions

Ledig, C., Theis, L., Huszar, F., Caballero, J., Aitken, A. P., Tejani, A., Totz, J., Wang, Z., and Shi, W. (2016).

Examples of DCGAN extensions

Examples of DCGAN extensions

Generative

Model the distribution of individual classes

 

p(x,y)

Discriminative

Learn the (hard or soft) boundary between classes

 

p(y|x)

Discriminative Models

Classification:

"The process of classification discards most of the information in the input and produces a single output"

(from Goodfelow book)

An example of

Generative Models

Sampling:

"The model generates new samples from the distribution p(x). This requires a good model of the entire input"

(from Goodfelow book)

An example of

What are generative networks?

  • Parameterized computational procedures for generating samples
  • The architecture provides the family of possible distributions to sample from
  • The parameters select a distribution from within that family

(from Goodfelow book)

Generative models - Landscape

  • Restricted Boltzman Machines (Somelnsky, 1986):
    • Modeling high order interactions between visible and hidden (unobserved) variables
    • Going Deep: DBMs (Hinton, 2006), DBNs (Hinton, 2006)
    • Problems: Intractable partition and inference function
    • Solution: Approximation (CD, Gibbs sampling)
    • Practice: Greedy layer-wise pre-training.

Generative models - Landscape

  • Variational autoencoder (Kingma, 2013; Rezende, 2014):
    • Another explicit method of approximating  the intractable partition function.
    • Variational inference is used to analytically solve the Lower Bound of the KL divergence.

Generative Adversarial Nets

George

Danielle

x~Pdata

Generative Adversarial Nets

George

Danielle

x~Pdata

x'=G(z)

z~uniform(0,1)

maximize D1, minimize D2

maximize D2

* D1 and D2 outputs is [0,1]

Generative Adversarial Nets

George

Danielle

x~Pdata

* D1 and D2 outputs is [0,1]

obj_d=tf.reduce_mean(tf.log(D1)+tf.log(1-D2))
opt_d=tf.train.GradientDescentOptimizer(0.01)
              .minimize(1-obj_d,global_step=batch,var_list=theta_d)

obj_g=tf.reduce_mean(tf.log(D2))
opt_g=tf.train.GradientDescentOptimizer(0.01)
              .minimize(1-obj_g,global_step=batch,var_list=theta_g)

GANs in action

  • z is the set of latent variables used by the generator to produce samples

GANs are difficult to train

  • GAN images suffer from being noisy and incomprehensible
  • Scaling up GANs using CNNs to model images have been unsuccessful.
  • GANs are unstable

DCGAN

DCGAN - a bag of tricks

  • Replace any pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator).
  • Use batchnorm in both the generator and the discriminator.
  • Remove fully connected hidden layers for deeper architectures.
  • Use ReLU activation in generator for all layers except for the output, which uses Tanh.
  • Use LeakyReLU activation in the discriminator for all layers.

Strided convolutions (all)

Fractionally strided convolutions

Subtitle

All convolutional except 

  • layer 1->2 : matrix multiplication (fully connected)
  • Z is reshaped into a 3-dimensional tensor and used as the start of the convolution stack
  • For the discriminator, the last convolution layer is flattened and then fed into a single sigmoid output

Batchnorm

  • Deals with poor initialization and helps gradient flow in deeper models
  • Prevents the generator from collapsing all samples to a single point which is a common failure mode observed in GANs.
  • Directly applying batchnorm to all layers however, results in sample oscillation and model instability.
  • This is avoided by not applying batchnorm to the generator output layer and the discriminator input layer

Remove fully connected hidden layers for deeper architectures

Use ReLU activation in generator for all layers except for the output, which uses Tanh

Use LeakyReLU activation in the discriminator for all layers

Tested dataset

  • Large-scale Scene Understanding (LSUN) (Yu et al., 2015)
  • Imagenet-1k (Deng et al., 2009)
  • Newly assembled Faces dataset (350000 cropped faces)

LSUN - is it overfitting?

  • Generated bedrooms after one training pass through the dataset.
  • Overfitting unlikely as we train with a small learning rate and minibatch SGD.
  • We are aware of no prior empirical evidence demonstrating memorization with SGD and a small learning rate ;0

LSUN - is it overfitting?

  • Generated bedrooms after five epochs of training.
  • There appears to be evidence of visual under-fitting via repeated noise textures across multiple samples such as the base boards of some of the beds

Interpolation between a series of 9 random points in Z

Walking in the latent space

GAN as feature extractors

  • Semi-Supervised learning:
    • Evaluating the quality of unsupervised representation learning algorithms by applying it as a feature extractor on supervised datasets 

CIFAR 10 

  • Trained the GAN on Imagenet-1k and then use the discriminator’s convolutional features from all layers, maxpooling each layers representation.
  • These features are then flattened and concatenated to form a 28672 dimensional vector and a regularized linear L2-SVM classifier is trained on top of them. 

SVHN DIGITS USING GANS AS A FEATURE EXTRACTOR

Shay - I think we can let this go (boring, we got the point) -

StreetView House Numbers dataset (SVHN)(Netzer et al., 2011)

Same feature extraction pipeline used for CIFAR-10.

Supervised CNN with the same architecture on the same data achieves a signficantly higher 28.87% validation error.

The internals of the networks

Discriminator

  • Unsupervised DCGAN trained on a large image dataset can learn a hierarchy of features which can be interesting

Guided Backprop (Springenberg et al., 2014)

Guided Backprop

Guided backpropagation

Generator representation

  • The generator might learn specific object representations for major scene components such as beds, windows, lamps, doors, and miscellaneous furniture.
  • To explore the form that these representations take, they tried to remove windows from the generator completely

Generator representation

  • On 150 samples, 52 window bounding boxes were drawn manually.
  • On the second highest convolution layer features, logistic regression was fit to predict whether a feature activation was on a window (or not).

Face vectors arithmetic

  •  For each column, the Z vectors of samples are averaged.
  • Arithmetic was then performed on the mean vectors creating a new vector Y .
  • The center sample on the right-hand side is produced by feeding Y as input to the generator.
  • To demonstrate the interpolation capabilities of the generator, uniform noise sampled with scale +-0.25 was added to Y to produce the 8 other samples

Face vectors arithmetic

  • A ”turn” vector was created from four averaged samples of faces looking left vs looking right
  • Reliably transform pose of random images using this vector

Generative Adversarial Nets

By shaybensasson

Generative Adversarial Nets

  • 1,835