Benchmarking Dexterous Manipulation (in Simulation)

General consensus(?) from previous meetings

"It would be great to have at least a small set of benchmarking simulation environments where we've established sim2real transfer to curated hardware."

A few comments from last meeting (quotes are approximate)

Lerrel: "We have lots of simulators that are fast, but inaccurate. What we really need is a simulator that runs closer to real-time but is much higher fidelity."

Jitendra: "I don't know how to write simulators, but I have friends in computer graphics that do. We should convince them to work on this problem."

Jim: "Should we be using learned models instead of physics engines?"

A few comments from last meeting (quotes are approximate)

Refrain: "How accurate does the simulator need to be? Isn't domain randomization better?"

I've found (physics-based) simulation incredibly useful for benchmarking dexterous manipulation

Distributions over manipulation scenarios

Generative models of mugs, vegetables, etc
Parameterized environments (lighting conditions, etc), ...

Scenario description files

Parameters, initial conditions, and noise described as exact values or distributions

Switched to a motion planning scheme that’s less sensitive to rack initial position (#2304)

Initial positions of the unmanipulated racks are drawn from MC instead of 0 (#2362)

Finding subtle bugs w/ Monte-Carlo testing

Falsification algorithms

Improve
robustness /
fix bugs

Increase test
randomness /
scope

"Naive" Monte-Carlo can be effective

Potential Sim2Real gaps

Video game rendering sufficient for training/fine-tuning computer vision (or that we don't need it!);
Video game physics insufficient for fine manipulation.

Most people would say that

Potential Sim2Real gaps

Contact modeling
- Rigid bodies; Soft bodies
Robot details
- reflected inertia, transmission dry friction, time delays...
- Sensor models (e.g. RGB-D)
- Controller models / gains
System ID (most assets flawed; garbage in \(\Rightarrow\) garbage out)
Stochastic rollouts + deterministic replay
- surprisingly rare in robotics! (e.g. ROS makes it hard)
Simulator bugs

Advanced contact modeling in Drake

Rich collision geometries

My claim: Subtle interactions between the collision and physics engines can cause artificial discontinuities/inaccuracy/instability

(sometimes with dramatic results)

Understanding this requires a few steps

Numerical methods must deal with overlapping geometry.
Standard approaches summarize the contact forces / constraints at one or more points.
It is effectively impossible to do this without introducing (potentially severe) discontinuities.

Rich collision geometries

Green arrow is the force on the red box due to the overlap with the blue box.

"Point contact" as implemented in Drake

Multi-point contact

Many heuristics for using multiple points...

"Hydroelastic contact" as implemented in Drake

Point contact vs hydroelastic

Point contact (discontinuous)

Hydroelastic

(continuous)

Hydroelastic is

slightly more expensive than point contact
(much) less expensive than finite-element models

State-space (for simulation, planning, control) is the original rigid-body state.

"Hydroelastic contact" as implemented in Drake

Example: Simulating LEGO® block mating

Manually-curated point contacts

Hydroelastic contact surfaces

Stable and symmetrical hydroelastic forces

Before

Now

Soft simulation in Drake

Full Video

Soft simulation in Drake

Block tower fall in Bullet Physics

Physics and rendering are not sufficient

Simulation also requires (thoughtful) modeling of

Robot controllers/firmware
Sensors
Sensor noise
Perception components
Planning components..
Time delays
...

Rigorous about randomness

Every source of randomness is declared explicitly (using elementary distributions)

Scene (#/type of objects)
Parameters/initial conditions
Time-varying noise

Reproducible random scenarios

Scene grammars

Structured parameterization over scenes

Some parameters control what gets generated
Some control pose and shape of what gets generated.

(work by Greg Izatt)

Scene grammars + variational inference

(1) Explain raw data with the model

(2) Fit model parameters to observed worlds

https://github.com/gizatt/spatial_scene_grammars

Companies like Waymo have toolchains for going from a log file back to a simulation scenario.

Can we generalize / automate that?

System ID

Code quality / correctness

Example: Sim2Real for Euler's disk

Drake is "production ready"

drake.mit.edu

Extremely-high code quality / test coverage
Monthly releases
3 to 6 month deprecation timelines
Aggressive license tracking
...

Already built in production build system at Amazon Robotics (and many others).

pip install drake
apt install drake

A few comments from last meeting (quotes are approximate)

Jim: "Should we be using learned models instead of physics engines?"

Refrain: "How accurate does the simulator need to be? Isn't domain randomization better?"

What is a (dynamic) model?

System

..., u_{-1}, u_0, u_1, ...

..., y_{-1}, y_0, y_1, ...

State-space

Auto-regressive (eg. ARMAX)

input

output

p(\theta, x_0, w_0, w_1, ...)

x_{n+1} = f(n, x_n, u_n, w_n, \theta) \\ \quad y_n = g(n, x_n, u_n, w_n, \theta)

y_{n+1} = f(n, u_n, u_{n-1}, ..., \\ \qquad \qquad y_n, y_{n-1}, ..., \\ \qquad \qquad w_n, w_{n-1}, ..., \theta)

state

noise/disturbances

parameters

Lagrangian mechanics,

Recurrent neural networks (e.g. LSTM), ...

Feed-forward networks, Transformers

What is a (dynamic) model?

System

..., u_{-1}, u_0, u_1, ...

..., y_{-1}, y_0, y_1, ...

State-space

Auto-regressive (eg. ARMAX)

input

output

x_{n+1} = f(n, x_n, u_n, w_n, \theta) \\ \quad y_n = g(n, x_n, u_n, w_n, \theta)

y_{n+1} = f(n, u_n, u_{n-1}, ..., \\ \qquad \qquad y_n, y_{n-1}, ..., \\ \qquad \qquad w_n, w_{n-1}, ..., \theta)

Model "class" vs model "instance"

\(f\) and \(g\) describe the model class.
with \(\theta\) describes a model instance.

"Deep models vs Physics-based models?" is about model class:

Should we prefer writing \(f\) and \(g\) using physics or deep networks?

Maybe not so different from

should we use ReLU or \(\tanh\)?
should we use LSTMs or Transformers?

x_{n+1} = f(n, x_n, u_n, w_n, \theta) \\ \quad y_n = g(n, x_n, u_n, w_n, \theta)

Galileo, Kepler, Newton, Hooke, Coulomb, ...

were data scientists.

They fit very simple models to very noisy data.

Gave us a rich class of parametric models that we could fit to new data.

What if Newton had deep learning...?

Galileo's notes on projectile motion

What makes a model class useful?

"All models are wrong, but some are useful" -- George Box

Of course, that depends on your use case...

Use case: Simulation for Benchmarking

What makes a model class useful?

Reasonable: \( y_{sim}(u) \in \{ y_{real}(u) \} \)
Coverage: \( \forall y_{real}(u), \exists y_{sim}(u) \)
Accuracy: \( p(y_{real} | u) \approx p(y_{sim} | u) \)
Corollary: Reliable system identification (data \( \Rightarrow \theta \))
Generalizable, efficient, repeatable, interpretable/debuggable, ...

Benchmarking Dexterous Manipulation

By russtedrake

Benchmarking Dexterous Manipulation

1,668

russtedrake PRO

Roboticist at MIT and TRI

people.csail.mit.edu/russt

Benchmarking Dexterous Manipulation (in Simulation)

General consensus(?) from previous meetings

A few comments from last meeting (quotes are approximate)

A few comments from last meeting (quotes are approximate)

I've found (physics-based) simulation incredibly useful for benchmarking dexterous manipulation

Distributions over manipulation scenarios

Scenario description files

Finding subtle bugs w/ Monte-Carlo testing

Falsification algorithms

Potential Sim2Real gaps

Potential Sim2Real gaps

Advanced contact modeling in Drake

Rich collision geometries

Rich collision geometries

"Point contact" as implemented in Drake

"Point contact" as implemented in Drake

Multi-point contact

"Hydroelastic contact" as implemented in Drake

Point contact vs hydroelastic

"Hydroelastic contact" as implemented in Drake

Example: Simulating LEGO® block mating

Soft simulation in Drake

Soft simulation in Drake

Physics and rendering are not sufficient

Rigorous about randomness

Reproducible random scenarios

Scene grammars

Scene grammars + variational inference

System ID

Code quality / correctness

Example: Sim2Real for Euler's disk

Drake is "production ready"

A few comments from last meeting (quotes are approximate)

What is a (dynamic) model?

What is a (dynamic) model?

Model "class" vs model "instance"

What makes a model class useful?

Use case: Simulation for Benchmarking

Benchmarking Dexterous Manipulation

More from russtedrake