Toward “Category-Level” Dexterous Manipulation

Russ Tedrake, Lucas Manuelli, Pete Florence, Wei Gao

 

russt@mit.edu

groups.csail.mit.edu/locomotion

Follow along at https://slides.com/russtedrake/rss2020-visual/live

kPAM: KeyPoint Affordances for Manipulation

Lucas Manuelli*, Wei Gao*, Peter R. Florence and Russ Tedrake. kPAM: KeyPoint Affordances for Category Level Manipulation. ISRR 2019

Follow along at: https://slides.com/russtedrake/rss2020-visual/live

Problem Statement

Manipulate potentially unknown rigid objects from a category (e.g. mugs, shoes) into desired target configurations.

SE(3) pose is difficult to generalize across a category

So how do we even specify the task?

What's the cost function?

(Images of mugs on the rack?)

3D Keypoints provide rich, class-general semantics

Constraints & Cost on Keypoints

... and robust performance in practice

Inverse kinematics as an optimization

\min_q | q-q_{desired}|

subject to:

  • rich end-effector constraints
  • joint limits
  • collision avoidance
  • "gaze constraints"
  • "feet stay put"
  • balance (center of mass)
  • ...

Keypoints are not a sufficient representation

relative to Sonia, this separates semantics from the dense geometry

kPAM pipeline

No template model nor pose appears in this pipeline.

Custom annotation

tool

RGBD image w/ instance segmentation

Grasp

Planner

3D Keypoint Detection Network

Image

3D

Inverse Kinematics Planner

Keypoint network

Architecture based on Sun, Xiao, et al. "Integral human pose regression." ECCV, 2018

Custom annotation tool

Sample of results

(shoes on rack)

# train objects 10
# test objects 20
# trials 100
placed on shelf 98%
heel error (cm) 1.09 ± (1.29)
toe error (cm) 4.34 ± (3.05)

+ shape completion network (kPAM-SC)

to include collision-avoidance constraints

So far, keypoints are geometric and semantic 

(mug handle, front toe of shoe), but required human labels

If we forgo semantics, can we self-supervise?

Follow along at: https://slides.com/russtedrake/rss2020-visual/live

Dense Object Nets

Core technology: dense correspondences

(built on Schmidt, Newcombe, Fox, RA-L 2017)

Peter R. Florence*, Lucas Manuelli*, and Russ Tedrake. Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation. CoRL, 2018.

Dense Object Nets

dense 3d reconstruction

+ pixelwise contrastive loss

New loss function sharpens correspondences

Now 3d correspondences, trained with multiview

Dense descriptors as self-supervised keypoints

But without semantics, how do we specify the task?

Dense Object Nets in Visuomotor Imitation Learning

Peter R. Florence, Lucas Manuelli, and Russ Tedrake. Self-Supervised Correspondence in Visuomotor Policy Learning. RA-L, April 2020

Visuomotor policies

Levine*, Finn*, Darrel, Abbeel, JMLR 2016 

Training the visual front-end (prior work)

  • Pose-based  auxiliary loss

 

  • Auto-encoding

 

 

  • End-to-end

Estimate object/hand pose

(but hard for category-level)

Idea: use dense descriptors

Idea: Use small set of dense descriptors

Imitation learning setup

from hand-coded policies in sim

and teleop on the real robot

Standard "behavior-cloning" object + data augmentation

Simulation experiments

"push box"

"flip box"

Policy is a small LSTM network (~100 LSTMs)

< demonstrations?

> generalization?

Dense Object Nets for Model-based Control

Late-breaking results:

Learn descriptor keypoint dynamics + trajectory MPC

Late-breaking results:

Learn descriptor keypoint dynamics + trajectory MPC

For all the details, check out

Lucas' PhD defense next Monday.

 

July 20, 2020 02:00PM Eastern

(ping Lucas for zoom info, or watch youtube just after)

Summary

Wanted: State representations for "category-level" dexterous manipulation

  • Keypoints can represent category-level geometry
    • Combine with local sensing / dense reconstruction
    • Now trying dynamics.
  • For task specification
    • Annotated keypoints with semantics
    • Self-supervised descriptors with demonstrations
  • Keypoints aren't everything.  They won't tell you when the eggs are done cooking!

 

This fall: my manipulation class at MIT will be online.

Toward "Category-Level" Dexterous Manipulation

By russtedrake

Toward "Category-Level" Dexterous Manipulation

A presentation at the RSS 2020 workshop on "Visual Learning and Reasoning for Robotic Manipulation" https://sites.google.com/view/rss20vlrrm

  • 672