Large Behavior Models

(Foundation models for dexterous manipulation)

Russ Tedrake

MIT, EECS/CSAIL

russt@mit.edu

DARPA Robotics Competition, 2015

LLMs \(\Rightarrow\) VLMs \(\Rightarrow\) LBMs

large language models

visually-conditioned language models

large behavior models

\(\sim\) VLA (vision-language-action)

\(\sim\) EFM (embodied foundation model)

vision encoder

language encoder

action

decoder

robot joint encoder

Q: Is predicting actions fundamentally different?

Why actions (for dexterous manipulation) could be different:

  • Actions are continuous (language tokens are discrete)
  • Have to obey physics, deal with stochasticity
  • Feedback / stability
  • ...

should we expect similar generalization / scaling-laws?

Robotics: Science and Systems, 2023

Diffusion Policy

\(\Rightarrow\) Many new startups (some low-cost, some humanoids)

\(\Rightarrow\) Major new investments by tech giants

&

The opportunity

  • Common-sense for physical intelligence
    • New levels of dexterity (manipulating cloth, liquids, etc)​​
    • Programmed via imprecise natural language and/or a few demonstrations
    • "Common-sense robustness"

 

  • GPT might make mistakes, but it always produces beautiful prose...

Q: Is predicting actions fundamentally different?

Why actions (for dexterous manipulation) could be different:

  • Actions are continuous (language tokens are discrete)
  • Have to obey physics, deal with stochasticity
  • Feedback / stability
  • ...

should we expect similar generalization / scaling-laws?

One problem: we don't (yet) have internet scale robot data

The Robot Data Diet

Big data

Big transfer

Small data

No transfer

 robot teleop

(the "transfer learning bet")

Open-X

simulation rollouts

novel devices

                               simulation for manipulation

NVIDIA selected Drake and MuJoCo

(for potential inclusion in Omniverse)

(Establishing faith in)

Studying the (new) fundamentals requires scale

  • Entirely new basic research questions (both theoretical and experimental)
  • Robotics is becoming "big science"
  • MIT (and academia more generally) has an essential role to play
    • need access to compute
    • need access to / strategies for scaling data
    • strong partnerships with industry

Online classes (videos + lecture notes + code)

http://manipulation.mit.edu