Follow live at https://slides.com/d/1UFnQdo/live (or later at https://slides.com/russtedrake/spring23-lec23)
Image credit: Boston Dynamics
Acrobot LQR w/ Kalman Estimator (from encoders)
True acrobot state
Estimator error \( (\hat{x} - x) \)
Key advance:
Visuomotor Policies
Levine*, Finn*, Darrel, Abbeel, JMLR 2016
Visuomotor policies
How do we synthesize visuomotor policies??
OpenAI - Learning Dexterity
Reinforcement Learning (RL)?
"And then … BC methods started to get good. Reallygood. So good that our best manipulation system today mostly uses BC, with a sprinkle of Q learning on top to perform high-level action selection. Today, less than 20% of our research investments is on RL, and the research runway for BC-based methods feels more robust."