Russ Tedrake
Follow along at https://slides.com/russtedrake/rl-2020-bc/live
or view later at https://slides.com/russtedrake/rl-2020-bc/
How broad is the umbrella of RL?
Sequential decision making, but then
Black-box / Derivative-free?
Learning + Control
Is the task difficult because we don't have a model?
for spreading peanut butter, buttoning my shirt, etc.
As more people started applying RL to robots, we saw a distinct shift from "model-free" RL to "model-based" RL.
(most striking at the 2018 Conference on Robot Learning)
Recent IFRR global panel on "data-driven vs physics-based models"
(caveat: I didn't choose the panel name)
Core technology: Deep learning perception module that learns "dense correspondences"
Learn a deep dynamic model of "keypoint" dynamics.
Online: use model-predictive control (MPC)
System
State-space
Auto-regressive (eg. ARMAX)
input
output
state
noise/disturbances
parameters
Lagrangian mechanics,
Recurrent neural networks (e.g. LSTM), ...
Feed-forward networks
(e.g. \(y_n\)= image)
System
State-space
Auto-regressive (eg. ARMAX)
input
output
input
cost-to-go
Q-functions are models, too. They try to predict only one output (the cost-to-go).
As you know, people are using Q-functions in practice on non-Markovian state representations.
\[ Q^{\pi}(n, x_n,u_n, \theta) \]
"Deep models vs Physics-based models?" is about model class:
Should we prefer writing \(f\) and \(g\) using physics or deep networks?
Maybe not so different from
Galileo, Kepler, Newton, Hooke, Coulomb, ...
were data scientists.
They fit very simple models to very noisy data.
Gave us a rich class of parametric models that we could fit to new data.
What if Newton had deep learning...?
Galileo's notes on projectile motion
Our physics models are (and have always been) differentiable.
You don't need neural networks. Just the chain rule.
"All models are wrong, but some are useful" -- George Box
e.g., for
What makes a model class useful?
What makes a model class useful?
State-space models tend to be more efficient/compact, but require state estimation.
vs.
Auto-regressive
State-space
Perhaps the biggest philosophical difference between traditional physics models and "universal approximators".
The failings of our physics-based models are mostly due to the unreasonable burden of estimating the "Lagrangian state" and parameters.
For e.g. onions, laundry, peanut butter, ...
The failings of our deep models are mostly due to our inability to due efficient/reliable planning, control design and analysis.
I want the next Newton to come around and to work on onions, laundry, peanut butter...
OpenAI - Learning Dexterity
"PPO has become the default reinforcement learning algorithm at OpenAI because of its ease of use and good performance."
https://openai.com/blog/openai-baselines-ppo/
A simple counter-example from static output feedback:
http://underactuated.mit.edu/policy_search.html
The set of stabilizing \(k\) is a disconnected set.
k |
Maximum real closed-loop eigenvalue |
---|---|
0.9 | -0.035 |
1.5 | 0.032 |
2.1 | -0.009 |
(for instance: better controller parameterizations)
ADPRL, 2012
ADPRL, 2012
http://underactuated.csail.mit.edu/lqr.html
http://www.ai.mit.edu/projects/leglab/robots/robots.html
for black-box, very high-dimensional, complex simulators
Falsification algorithms are not designed for coverage.
find \(x < 20\)
vs
estimate \(p(x<20)\)
Region of interest
a smooth ladder of samplers
Finds surprisingly rare and diverse failures of the full comma.ai openpilot in the Carla simulator.
The primary authors have now created a startup:
Invites super interesting questions for RL / control.
We have parameterized simulations to study distributional robust / distribution shift.
Releasing all of these as a part of my fall manipulation class at MIT (which is open and online).
If my robot is in the world, and experiences a scenario -- it's hard to put that back into simulation.
Information theoretic MPPI driving an AutoRally platform aggressively at the Georgia Tech Autonomous Racing Facility.
OpenAI dexterous hand
Click here
(on this slide)
Then here
(in the colab window)
or from Ch.1 at http://manipulation.mit.edu