Russ Tedrake
MIT–Amazon Science Hub Robotics Research Day 2025
Jensen @ CES 2025
Important roles to play for both industry and academia.
"... to move beyond the state-of-the-art in fast dynamic, contact-rich object manipulation and data-driven adaptation [...] for high-speed manipulation of highly diverse objects in densely occupied spaces."
Robin: https://www.amazon.science/latest-news/amazon-robotics-see-robin-robot-arms-in-action
Cardinal: https://www.aboutamazon.com/news/operations/10-years-of-amazon-robotics-how-robots-help-sort-packages-move-product-and-improve-safety
Sparrow: https://www.aboutamazon.com/new/transportation/amazon-robot-sparrow-streamlines-order-fulfillment-process
2021
2025
Motion Planning (w/ Pablo Parrilo)
"Real2Sim" (w/ Phil Isola)
Theory of Visuomotor Control w/ Generative AI
(w/ Asu and Pablo)
2021
2025
Motion Planning
"Real2Sim"
Theory of Visuomotor Control
Claims:
Graphs of Convex Sets (GCS)
(discrete + continuous planning and control)
+ many amazing students and AR advocates/collaborators
Graphs of Convex Sets with Applications to Optimal Control and Motion Planning.
Tobia Marcucci. PhD Thesis, MIT, 2024.
EECS George M. Sprowls PhD Thesis Award in Artificial Intelligence and Decision Making
There are two cases that we completely understand:
but robotics problems often have elements of both...
Two stages:
Integrate:
start
goal
The Probabilistic Roadmap (PRM)
from Choset, Howie M., et al. Principles of robot motion: theory, algorithms, and implementation. MIT press, 2005.
Graphs of convex sets (GCS) offers a new relaxation / modeling framework for joint discrete + continuous optimization
\(\varphi_{ij} = 1\) if the edge \((i,j)\) in shortest path, otherwise \(\varphi_{ij} = 0.\)
\(c_{ij} \) is the (constant) length of edge \((i,j).\)
"flow constraints"
binary relaxation
path length
Note: The blue regions are not obstacles.
Classic shortest path LP
now w/ Convex Sets
Shortest path, Traveling salesperson, minimum spanning tree, bipartite matching, facility location, ...
Ex: Minimum spanning tree
Ex: Minimum-volume sphere collision geometry (as facility location on a GCS)
Finite MDP (e.g. w/ deterministic transitions) is shortest path
When sets are points, GCS transcription yields exactly the well-known linear program (LP) for shortest path.
\[ \min_{x[\cdot],u[\cdot]} \sum_{n=0}^N x_n^T Q x_n + u_n^T Ru_n \\ \text{s.t. } x_{n+1} = Ax_n + Bu_n \]
Sets \( X_n: (x_n, u_n) \)
Edge cost
Edge constraint
n=0
n=1
n=2
n=N
\( \cdots \)
For a serial chain, GCS will generate exactly the familiar MPC transcriptions, e.g. quadratic programs (QPs)
e.g. for hybrid trajectory optimization
n=0
n=1
n=N
...
\[ \min_{x[\cdot],u[\cdot]} \sum_{n=0}^N x_n^T Q_i x_n + u_n^T R_iu_n \\ \text{s.t. } x_{n+1} = A_ix_n + B_iu_n \\ \text{iff } (x_n,u_n) \in D_i \]
start
goal
is the convex relaxation. (it's tight!)
Previous formulations were intractable; would have required \( 6.25 \times 10^6\) binaries.
minimum distance
minimum time
Discrete paths + continuous (convex via differential flatness)
Transitioning from basic research to real use cases
on its next-gen ASIN manipulation program
This morning we had talks by the students, including...
Aside: This work uncovered a compelling geometric interpretation of the "standard" SDP relaxation; another math paper coming!
2021
2025
Motion Planning
"Real2Sim"
Theory of Visuomotor Control
How can we scalably obtain simulation assets?
joint angles
joint torques
Data matrix:
s.t.
pseudo-inertia
+ optimal experiment trajectory design
We've generated a significant object dataset. Jeremy at Amazon is using it now and really scaling it up.
2021
2025
Motion Planning
"Real2Sim"
Theory of Visuomotor Control
Robotics: Science and Systems, 2023
International Journal of Robotics Research, 2024
Diffusion Policy (DP) is "single-task"
LBMs are "multitask"
vision encoder
language encoder
action
decoder
robot joint encoder
vision encoder
action
decoder
robot joint encoder
large language models
visually-conditioned language models
large behavior models
\(\sim\) VLA (vision-language-action)
\(\sim\) EFM (embodied foundation model)
vision encoder
language encoder
action
decoder
robot joint encoder
We need to develop the new theory of (visuomotor) control.
Diffusion Policy / Large Behavior Models from TRI have a context length of 2.
RT-1, RT-2, OpenVLA, \(\pi_0\), ... all have a context length of 1.
Why?
Even with more data, training with a longer context length makes performance worse.
Idea #1: Study diffusion policy for Linear-Quadratic Gaussian control problems.
Idea #2: Construct the simplest possible experiments that do exhibit the problem, and study them empirically.
Cotraining: Use both datasets to train a model that maximizes some real-world performance objective
Notation:
\(\mathcal D\) - dataset
\(O\) - observations
\(A\) - actions
\(R\) - real
\(S\) - sim
Cotraining: Use both datasets to train a model that maximizes some real-world performance objective
Objective:
Success rate on planar pushing from pixels
Limited to single task, but carefully controlled experiments and thorough analysis
Cotraining: Use both datasets to train a model that maximizes some real-world performance objective
Datasets:
Model:
Diffusion Policy
[1] Graedsal et. al, "Towards Tight Convex Relaxations For Contact-Rich Manipulation"
Real Data, \(\mathcal D_R\)
Sim Data, \(\mathcal D_S\)
Objective:
Success rate on planar pushing from pixels
\(\mathcal L_{\mathcal D}\) - denoising loss for dataset \(\mathcal D\)
\(\alpha\) - mixing ratio
We investigate 6 sim2real gaps:
Key Findings
(for planar pushing...)
On campus, we're feeling very limited with compute.
AWS credits via the Science Hub/research awards are ineffective with non-discounted AWS pricing.
It has become a real bottleneck.
Thank you for the support!
For a living doc with up-to-date references / examples: