Different views on robustness

Robust Control to Foundation Models

Russ Tedrake

November 6, 2023

"Dexterous Manipulation" Team

(founded in 2016)

Distributions over manipulation scenarios

Parameterized procedural mugs, even vegetables!
Parameterized environments (lighting conditions, etc), ...

Finding subtle bugs w/ Monte-Carlo testing

For the next challenge:

Good control when we don't have useful models?

For the next challenge:

Good control when we don't have useful models?

Rules out:
- (Multibody) Simulation
- Simulation-based reinforcement learning (RL)
- State estimation / model-based control
My top choices:
- Learn a dynamics model
- Behavior cloning (imitation learning)

I was forced to reflect on my core beliefs...

The value of using RGB (at control rates) as a sensor is undeniable. I must not ignore this going forward.
I don't love imitation learning (decision making $\gg$ mimcry), but it's an awfully clever way to explore the space of policy representations
- Don't need a model
- Don't need an explicit state representation
  - (Not even to specify the objective!)

We've been exploring, and found something good in...

From Thursday...

Check out the TRI demo on Wednesday afternoon

Denoising diffusion models (generative AI)

Image source: Ho et al. 2020

Denoiser can be conditioned on additional inputs, $u$: $p_\theta(x_{t-1} | x_t, u) $

Image backbone: ResNet-18 (pretrained on ImageNet)
Total: 110M-150M Parameters
Training Time: 3-6 GPU Days ($150-$300)

Why (Denoising) Diffusion Models?

High capacity + great performance
Small number of demonstrations (typically ~50)
Multi-modal (non-expert) demonstrations
Training stability and consistency
- no hyper-parameter tuning
Generates high-dimension continuous outputs
- vs categorical distributions (e.g. RT-1, RT-2)
- Action-chunking transformers (ACT)
Solid mathematical foundations (score functions)
Reduces nicely to the simple cases (e.g. LQG / Youla)

A derministic interpretation (manifold hypothesis)

Denoising approximates the projection onto the data manifold;

approximating the gradient of the distance to the manifold

Dynamic output feedback

..., u_{-1}, u_0, u_1, ...

..., y_{-1}, y_0, y_1, ...

input

output

Control Policy
(as a dynamical system)

"Diffusion Policy" is an auto-regressive (ARX) model with forecasting

\begin{aligned} [y_{n+1}, ..., y_{n+P}] = f_\theta(&u_n, ..., u_{n-H} \\ &y_n, ..., y_{n-H} )\end{aligned}

$H$ is the length of the history,

$P$ is the length of the prediction

Conditional denoiser produces the forecast, conditional on the history

Learns a distribution (score function) over actions

e.g. to deal with "multi-modal demonstrations"

Enabling technologies

Haptic Teleop Interface

Excellent system identification / robot control

Visuotactile sensing

with TRI's Soft Bubble Gripper

Open source:

https://punyo.tech/

Scaling Up

I've discussed training one skill
Wanted: few shot generalization to new skills
- multitask, language-conditioned policies
- connects beautifully to internet-scale data