Wayne State University
University of Massachusetts, Amherst
"Proceedings of the IEEE international conference on computer vision pp. 945-953. 2015."
Recognizing 3D shapes from a collection of their rendered views on 2D images
[1] Wu, Zhirong, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. "3d shapenets: A deep representation for volumetric shapes." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912-1920. 2015.
[1]
Trade-off
Increasing the amount of explicit depth information (3D models)
Increasing spatial resolution
(projected 2D models)
2D 3D
2D 3D
Learn a good deal about generic features for 2D image categorization
Fine-tune to specifics about 3D model projections
Multi-view Representation
Give an Input 3D Object
Multi-view Representation
Generating multi-views of the 3D object by
rendering engine
Multi-view Representation
Generating multi-views of the 3D object by
rendering engine
Multi-view Representation
Generating multi-views of the 3D object by
rendering engine
view 1
view 2
view 3
view N
Average the individual descriptors
treating all the views as equally important
Generate a 2D image descriptor per each view
Use the individual descriptors directly for recognition tasks based on some voting or alignment scheme
Concatenate the 2D descriptors of all the views.
Method 1:
An aggregated representation combining features from multiple views
Method 2:
Combine information from multiple views using a unified CNN architecture that includes a view-pooling layer
Method 2:
An aggregated representation combining features from multiple views
Multi-view Representation
Multi-view Representation (Input Generation)
[1] B. T. Phong. Illumination for computer generated pictures. Commun. ACM, 18(6), 1975.
[1]
Multi-view Representation (Input Generation)
Phong reflection model
Multi-view Representation (Input Generation)
Setup viewpoints (virtual cameras) for rendering each mesh:
1st camera setup
Multi-view Representation (Input Generation)
Setup viewpoints (virtual cameras) for rendering each mesh:
2nd camera setup
Multi-view Representation (Recognition)
The most straightforward approach to utilizing the multi-view representation:
It results in multiple 2D image descriptors per 3D shape, one per view
Need to be integrated for recognition tasks
Multi-view Representation (Recognition)
[1]
[2]
[1] J. Sanchez, F. Perronnin, T. Mensink, and J. Verbeek. Image classification with the Fisher vector: Theory and practice. 2013.
[2] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. DeCAF: A deep convolutional activation feature for generic visual recognition. CoRR, abs/1310.1531, 2013.
Multi-view Representation (Recognition)
[1]
[1] A. Vedaldi and B. Fulkerson. VLFeat: An open and portable library of computer vision algorithms. http://www. vlfeat.org/, 2008.
Multi-view Representation (Recognition)
[1]
[1] K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. Return of the devil in the details: Delving deep into convolutional nets. In Proc. BMVC, 2014.
Multi-view Representation (Recognition)
The distance between shape x and y is defined in:
Multi-view Representation (Recognition)
A distance or similarity measure is required for retrieval tasks
image descriptors
Multi-view CNN: Learning to Aggregate Views
Multi-view CNN: Learning to Aggregate Views
Share the same parameters
Multi-view CNN: Learning to Aggregate Views
element-wise maximum pooling across the views
Multi-view CNN: Learning to Aggregate Views
Produce shape descriptors
Multi-view CNN: Learning to Aggregate Views
Fine-tuned using Stochastic gradient descent with back-propagation
[1] The Princeton ModelNet. http://modelnet.cs. princeton.edu/.
[1]
3D Shape Classification and Retrieval
Classification and retrieval results on the ModelNet40 dataset
Classification and retrieval results on the ModelNet40 dataset
Classification and retrieval results on the ModelNet40 dataset
Rank pixels in the 2D views w.r.t. their influence on the output score Fc of the network for the class c
a set of K 2D views
The saliency maps are computed by back-propagating the gradients of the class score onto the image via the view-pooling layer.
Examples of saliency maps
Top three views with the highest saliency
Sketch Recognition
Whether aggregating multiple views of a 2D image also improve performance?
Sketch Recognition (Jittering revisited)
Data jittering, or data augmentation:
Generate extra samples from a given image
the process of perturbing the image by transformations that change its appearance while leaving the high-level information (class, label, attributes, etc.) intact
Sketch Recognition (Jittering revisited)
Data jittering improves the performance of deep representations on 2D image classification tasks
includes random image translations (implemented as random crops), horizontal reflections, and color perturbations
only includes a few crops (e.g., four at the corners, one at the center and their horizontal reflections)
training
test
Sketch Recognition (Jittering revisited)
20,000 hand-drawn sketches of 250 object categories such as airplanes, apples, bridges, etc
[1]
[1] M. Eitz, J. Hays, and M. Alexa. How do humans sketch objects? ACM Trans. Graph., 31(4):44:1–44:10, 2012.
human sketch dataset
The cleaned dataset (SketchClean) contains 160 categories
Sketch Recognition (Jittering revisited)
To get multiple views from 2D images, jittering is used to mimic the effect of views.
Sketch Recognition
Sketch-based 3D Shape Retrieval
Most online repositories provide only text-based search engines or hierarchical catalogs for 3D shape retrieval.
Sketch-based 3D Shape Retrieval
Sketchbased shape retrieval has been proposed as an alternative for users to retrieve shapes with an approximate sketch of the desired 3D shape in mind
hand-drawn sketches
sketches can be highly abstract and visually different from target 3D shapes.
Sketch-based retrieval involves two heterogeneous data domain
3D shapes
Sketch-based 3D Shape Retrieval
Dataset
193 sketches and 790 CAD models from 10 categories existing in both SketchClean and ModelNet40
Sketch-based 3D Shape Retrieval
Renderings of 3D shapes with a style similar to hand-drawn sketches
Sketch-based 3D Shape Retrieval
Sketch-based 3D Shape Retrieval