MIT 6.800/6.843:
Robotic Manipulation
Fall 2021, Lecture 21
Follow live at https://slides.com/d/YPYIYRA/live
(or later at https://slides.com/russtedrake/fall21-lec21)
As more people started applying RL to robots, we saw a distinct shift from "model-free" RL to "model-based" RL.
(most striking at the 2018 Conference on Robot Learning)
IFRR global panel on "data-driven vs physics-based models"
(caveat: I didn't choose the panel name)
System
State-space
Auto-regressive (eg. ARMAX)
input
output
state
noise/disturbances
parameters
Lagrangian mechanics,
Recurrent neural networks (e.g. LSTM), ...
Feed-forward networks
(e.g. \(y_n\)= image)
System
State-space
Auto-regressive (eg. ARMAX)
input
output
input
cost-to-go
Q-functions are models, too. They try to predict only one output (the cost-to-go).
As you know, people are using Q-functions in practice on non-Markovian state representations.
\[ Q^{\pi}(n, x_n,u_n, \theta) \]
"Deep models vs Physics-based models?" is about model class:
Should we prefer writing \(f\) and \(g\) using physics or deep networks?
Maybe not so different from
Galileo, Kepler, Newton, Hooke, Coulomb, ...
were data scientists.
They fit very simple models to very noisy data.
Gave us a rich class of parametric models that we could fit to new data.
What if Newton had deep learning...?
Galileo's notes on projectile motion
"All models are wrong, but some are useful" -- George Box
e.g., for
What makes a model class useful?
What makes a model class useful?
State-space models tend to be more efficient/compact, but require state estimation.
vs.
Auto-regressive
State-space
Perhaps the biggest philosophical difference between traditional physics models and "universal approximators".
The failings of our physics-based models are mostly due to the unreasonable burden of estimating the "Lagrangian state" and parameters.
For e.g. onions, laundry, peanut butter, ...
The failings of our deep models are mostly due to our inability to due efficient/reliable planning, control design and analysis, to make sufficiently accurate long-horizon predictions, and to generalize under distribution shift.
I want the next Newton to come around and to work on onions, laundry, peanut butter...
American Controls Conference (ACC), 1991
Learn descriptor keypoint dynamics + trajectory MPC
Learn descriptor keypoint dynamics + trajectory MPC
Some take-aways:
Lucas at his defense: "Perception doesn't feel like the bottleneck anymore; it fees like the bottleneck is control."
H.J. Terry Suh and Russ Tedrake. The surprising effectiveness of linear models for visual foresight in object pile manipulation. To appear in Workshop on the Algorithmic Foundations of Robotics (WAFR), 2020
Target Set
The big question:
Find a single policy (pixels to torques) that is invariant to the number of carrot pieces.
Try images as the state representation.
A control-Lyapunov function in image coordinates
Requires a forward model...
Crazy: Try a linear model (for each discretized action)
Each row of is an image, the "receptive field" of
My take-aways:
LEARNING INVARIANT REPRESENTATIONS FOR REINFORCEMENT LEARNING WITHOUT RECONSTRUCTION
Amy Zhang, Rowan McAllister, Roberto Calandra, Yarin Gal, Sergey Levine
ICLR 2021
from "Draping an Elephant: Uncovering Children's Reasoning About Cloth-Covered Objects" by Tomer Ullman et al, 2019