Virtual Cafe

3D Object Recognition By Using Google Tango Project And Creating Virtual World

Members

Virtual Cafe

Wan

Chatchawan Yoojuie

Benz

Natthakul Boonmee

Top

Kanin Kunapermsiri

Virtual Cafe

3D Object Recognition By Using Google Tango Project And Creating Virtual World

 

Advisor

Dr. Kwankamol Nongpong

 

Senior Project

Semester 2/2016

Introduction

1

What Tango can do

  • Create indoor navigator (without using GPS)

  • Create accurate measurement tools

  • Create augmented reality game

Tango

2

What Tango cannot do

  • 3D object detection

  • Create realistic augmented reality application

Tango

3

Goal of Project

  • Learn the surrounding environments and transform the physical world into the virtual world
  • Recognize the 3D object and display inside the virtual world
  • Basic interaction with the objects inside the virtual world

Note: The environments and detected object will be unmovable.

Goal

4

Software

  • Google Platform

  • Point Cloud Library
  • Unity

for capturing image and position

for 3D image processing

for rendering virtual world and virtual object

Tools

5

Hardware

  • Lenovo Phab 2 Pro

Supported by Google Tango

Tools

6

Google Tango Platform

  • It’s a computer vision platform that can do the Area Learning, Motion Tracking, and Depth Perception

7

Tools

  • Computer vision library for 3D image which is used for processing the data and recognizing objects using C++

8

Point Cloud Library

Tools

9

Tools

10

Tools

Point Clouds

Point cloud is a set of points in the 3D coordinates system

It represents as 3D image and each point in the image contain x, y and z value

  • The game engine which provides all necessary tools

  • Coding in C#

  • Use for rendering 3D objects and virtual world

  • Google provided Tango API for unity

11

Unity

Tools

  • Android phone that supports Google Tango Platform

  • Equipped with IR sensor for capturing point cloud

12

Tools

Lenovo Phab 2 Pro

13

Framework

Framework

1. Area Mapping (Creating a room)

The design framework can be divided into two parts :-

2. Training Dataset and Object Recognition

14

Framework

Area Learning

  • Make application remember the room by scanning around the room

  • Save into ADF file

     

15

Framework

Area Mapping

  • Measure the actual room size by marking corners and look for distance

  • Save into XML file

     

16

Framework

Area Rendering

  • Load the ADF file that we saved in area learning part along with the XML file that contains all the vertices representing the corners of the room

  • Use Unity to render previous data
    into virtual room

17

Framework

1. Area Mapping (Creating a room)

The design framework can be divided into two parts :-

2. Training Dataset and Object Recognition

Framework

18

Framework

Training Dataset

  • Create datasets that will be used in matching step

  • Recognize the object with the dataset

  • 6DOF pose estimation of the detected object

  • Display object model in Unity

     

     

The centroid is a point of the result by calculating the mean value of all points in the cloud

 

 

19

What is Centroid?

It is a " Center of Mass "

Framework

Six degrees of freedom (6DoF) refers to the position of the object in 3D dimension space which is described with translation and rotation

 

 

 

20

What is 6DoF?

Framework

With ground-truth information of the object in unity coordinate system

It contains these information :-
1. Translation of the device (Vector format)
2. Rotation of the device (Quaternion format)
3. Translation of the object (Vector format)
4. Rotation of the object (Quaternion format)

21

How can we find 6DoF of the object?

Framework

0    0    0
-0.259    0.001    -0.004    -0.966
-0.027    -0.082    0.0754
0    0.966    -0.259    0

Feature extraction that encodes the information about the point cloud

Basically, there are two types of descriptor in PCL :-
1. Local - computed for individual points
2. Global - computed for the whole cluster that represents an object

 

 

22

What is descriptor?

Framework

VFH  (Viewpoint Feature Histogram)

23

Framework

Setting Up

Tripod

Pan-tile

24

Framework

Collecting Dataset

  • Capture the snapshots of the object along with ground-truth (pose) information of the object at every 40 degree
  • As a result, we have a total of 9 different snapshots
  • Then, we use these 9 snapshots as reference frames

25

Framework

Collecting Dataset

  • We can improve and extend the dataset using these reference frame
  • Also, the descriptors must be computed for every snapshot in the dataset

Structure of the dataset :-

1. Object Snapshot ( .PCD )
2. Descriptor ( .PCD )
3. Ground-Truth ( .TXT )

 

26

Framework

The design framework can be divided into two parts :-

Framework

2. Training Dataset and Object Recognition

27

Framework

Capturing Scene

  • Capture point cloud and send to the server via socket

  • Then, follow the process of Global Pipeline

     

     

     

28

Framework

Global Pipeline

The global pipeline contains 4 steps

29

Framework

Segmentation

Perform segmentation on the cloud in order to retrieve all possible clusters on the plane surface

30

Framework

Descriptor

For every cluster that has survived in the segmentation step, a global descriptor must be computed

31

Framework

Matching

Use the descriptor to perform a search for their nearest neighbors in the database

32

Framework

Alignment & ICP

  • With ground-truth that saved along with the dataset
  • Determine translation of object by computing and aligning the centroids of the clusters
  • For the rotation, we can use ICP to compute and find the best transformation from source (given dataset from matching step) to target (current cluster)

 

33

Framework

Alignment & ICP

34

Framework

Object Pose Estimation

  • The output of the global pipeline will be sent back to the device

  • Output is in XML format

  • Some calculation needed to extract those pose estimation of the object and display in Unity

35

Framework

Object Pose Estimation

These are 5 pieces of information extracted from the output:

(1)  DR = Unity ground-truth of D : device rotation in Quaternion format
(2)  OR = Unity ground-truth of D : object rotation in Quaternion format
(3)  ICP = ICP from S to D : transformation in Matrix format
(4)  SC = Centroid of S in Vector format
(5)  DCO = Database centroid offset, the offset of the centroid between SC(4) and Unity ground-truth of D : object centroid(6) in Vector format

1

2

3

D

S

4

6

D

Unity Coordinate

PCL Coordinate

36

Framework

Use Unity to render the detected object according to the data that extracted from previous

Object Rendering

37

Flow

Flow of The Application

38

Evaluation

Evaluation

  • Testing for precision, recall, and f-measure of object recognition
  • Testing for how well it can get the correct pose

  • Use a single white rectangle box for both testing

39

Preparing Dataset

Training dataset :

  • There are 2 datasets which object is trained
  • First dataset has 34 scenes, trained at range 0.9 metre
  • Second dataset has 16 scenes, trained at range 1.5 metre

 

Evaluation

40

Preparing Dataset

Testing dataset :

  • Use 3 sets which object will be placed at range 0.5, 1.0, and 1.5 metre
  • Each distance has 10 scenes that will contain the object at different viewpoint (rotate at every 40 degrees)
  •  Addition 5 more scenes without the object

 

Evaluation

41

Preparing Dataset

Total of tested scenes :

  • With Object  = 30 scenes
  • Without Object = 15 scenes

 

Evaluation

42

Sample Point Clouds

Captured At Training

Evaluation

43

Environment

  • Room with no sunlight passing through
  • No mirror
  • Use tripod to hold the device steady

Evaluation

44

Result - Detection Accuracy

Dataset 1 At 0.9 metre

Dataset 2 At 1.5 metre

  • The performance from dataset 2 is significantly drops compared to dataset 1

  • The threshold value of the matching is too large

  • The quality and detail of the point cloud changed according to distance

Evaluation

45

Result - Pose Estimation

  • Distance of the object at the training stage have a huge impact on the accuracy of the recognition system

  • At 0.5 metre is slight lower performance than at 1.0 metre.

DEMO

46

Challenges

Challenges

  • Limited access to Tango API
  • Difficult to control variable
  • The sensor is poor, so the distance can affect the details of the point cloud
  • Compile Point Cloud Library for android
  • Small community
  • Less example
  • Less guideline

 

47

Imporvement

Improvement

  • Training dataset can be improved by using pan-tilt that can be rotated almost at all the angles i.e., x, y and z rotation
  • Do some research on how to improve the quality of the point cloud

 

- Q&A -

Thank you

VIRTUAL CAFE

By Cwan Yo

VIRTUAL CAFE

  • 474