Deep Perception

(for manipulation)

Part 1

MIT 6.800/6.843:

Robotic Manipulation

Fall 2021, Lecture 11

Follow live at https://slides.com/d/uXws4NM/live

(or later at https://slides.com/russtedrake/fall21-lec11)

Limitations of using geometry only

No understanding of what an object is.
- "Double picks"
- Might pick up a heavy object from one corner
Partial views
Depth returns don't work for transparent objects
...

A sample annotated image from the COCO dataset

What object categories/labels are in COCO?

Fine tuning

source: https://d2l.ai/chapter_computer-vision/fine-tuning.html

R-CNN (Regions with CNN features)

source: https://towardsdatascience.com/understanding-regions-with-cnn-features-r-cnn-ec69c15f8ea7

Faster R-CNN adds a "region proposal network"

source: https://www.analyticsvidhya.com/blog/2018/07/building-mask-r-cnn-model-detecting-damage-cars-python/

Mask R-CNN notebook fine-tuned on clutter clearing data:

(data generation and training notebooks are also linked from the chapter)

Deep Pose Estimation

Bingham distribution (over unit quaternions)

from Jared Glover's PhD thesis, 2014

in 2D:

Bingham distribution (over unit quaternions)

from Jared Glover's PhD thesis, 2014

quick experiments using CLIP "out of the box" by Kevin Zakka

Lecture 11: Deep Perception (part 1)

By russtedrake

Lecture 11: Deep Perception (part 1)

MIT Robotic Manipulation Fall 2021 http://manipulation.csail.mit.edu

681

russtedrake PRO

Roboticist at MIT and TRI

people.csail.mit.edu/russt