Xiaolong Wang, Abhinav Gupta
Robotics Institute, Carnegie Mellon University
Presenter: Wei YANG, CUHK
Ph.D. Student @ RI, CMU
Machine Learning Dept.
PAMI Young Researcher Award 2016
G
Generator
D
Discriminator
Random Noise
Generated
Real
Real samples
Random noise sampled from uniform distribution
Loss for D network
Where L is the binary cross-entropy loss,
Real samples
Random noise sampled from uniform distribution
Loss for G network
It tries to fool the discriminator D to classify the generated sample as real sample
=
+
Structure
Style (texture)
Dataset: NYU v2 RGBD dataset
Structure-GAN + Style-GAN
Structure-GAN
Style-GAN
Style-GAN
Ground truth
generates surface normals from sampled
Input
100-d vector sampled from uniform distribution
Output
72*72*3 surface normal map
Input
Surface normal maps (1/2 generated, 1/2 ground truth)
Output
Binary classification (generated or ground truth)
Conditional GAN
Generator is conditioned on additional information
Input
100-d vector sampled from uniform distribution
RGB images
Ground truth surface normal
Output
128*128*3 scene image
Drawback
The generated images are noisy
Edges are not well aligned with the surface normal
Multi-task learning with pixel-wise constraints
Assumption: if the generated image is real enough, it can be used for reconstructing the surface normal maps
FCN loss
Full loss
EM-like training algorithm
visulized results (gt from NYUv2 test set)
with/without pixel-wise constraints
Inputs are 3D model annotations.
DCGAN: Radford et al. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv preprint arXiv:1511.06434 (2015).
LAPGAN: Denton et al. "Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks." NIPS, 2015.
interpret the model by manipulating the latent space
"Growing" 3D cubic
Shutdown window
https://github.com/xiaolonw/ss-gan