(Foundation models for dexterous manipulation)
Russ Tedrake
CSAIL Alliances Meeting
April 4, 2024
large language models
visually-conditioned language models
large behavior models
\(\sim\) VLA (vision-language-action)
\(\sim\) EFM (embodied foundation model)
Why actions (for dexterous manipulation) could be different:
should we expect similar generalization / scaling-laws?
Recent success in (single-task) behavior cloning suggests that these are not blockers
but we don't have internet-scale action data yet.
Big data
Big transfer
Small data
No transfer
Ego-Exo
robot teleop
(the "transfer learning bet")
Open-X
simulation rollouts
NVIDIA selected Drake and MuJoCo
(for potential inclusion in Omniverse)
"Graphs of Convex Sets" (GCS)
http://manipulation.mit.edu