Visuomotor Policies
(via Behavior Cloning)

prelude to Reinforcement Learning

MIT 6.800/6.843:

Robotic Manipulation

Fall 2021, Lecture 17

Follow live at https://slides.com/d/yOA4Yeo/live

(or later at https://slides.com/russtedrake/fall21-lec17)

From planning to policies

The MIT Leg Lab Hopping Robots

http://www.ai.mit.edu/projects/leglab/robots/robots.html

How do you represent your motions?

(continuous time/state/action)

Task-level: Planning

(discrete/symbolic)

From planning to policies

policy needs to know

state of the robot x state of the environment

Levine*, Finn*, Darrel, Abbeel, JMLR 2016 

Visuomotor policies

What is a (dynamic) model?

System

..., u_{-1}, u_0, u_1, ...
..., y_{-1}, y_0, y_1, ...

State-space

Auto-regressive (eg. ARMAX)

input

output

p(\theta, x_0, w_0, w_1, ...)
x_{n+1} = f(n, x_n, u_n, w_n, \theta) \\ \quad y_n = g(n, x_n, u_n, w_n, \theta)
y_{n+1} = f(n, u_n, u_{n-1}, ..., \\ \qquad \qquad y_n, y_{n-1}, ..., \\ \qquad \qquad w_n, w_{n-1}, ..., \theta)

state

noise/disturbances

parameters

Idea: Use small set of dense descriptors

Imitation learning setup

from hand-coded policies in sim

and teleop on the real robot

Standard "behavior-cloning" objective + data augmentation

Simulation experiments

"push box"

"flip box"

Policy is a small LSTM network (~100 LSTMs)

https://learning-from-play.github.io/

https://roboturk.stanford.edu/

Lecture 17: Visuomotor Policies (via Behavior Cloning)

By russtedrake

Lecture 17: Visuomotor Policies (via Behavior Cloning)

MIT Robotic Manipulation Fall 2020 http://manipulation.csail.mit.edu

  • 911