Benchmarking Dexterous Manipulation (in Simulation)

General consensus(?) from previous meetings

"It would be great to have at least a small set of benchmarking simulation environments where we've established sim2real transfer to curated hardware."

A few comments from last meeting (quotes are approximate)

 Lerrel: "We have lots of simulators that are fast, but inaccurate.  What we really need is a simulator that runs closer to real-time but is much higher fidelity."

 Jitendra: "I don't know how to write simulators, but I have friends in computer graphics that do. We should convince them to work on this problem."

 Jim: "Should we be using learned models instead of physics engines?"

A few comments from last meeting (quotes are approximate)

 Refrain: "How accurate does the simulator need to be?  Isn't domain randomization better?"

I've found (physics-based) simulation incredibly useful for benchmarking dexterous manipulation

Distributions over manipulation scenarios

  • Generative models of mugs, vegetables, etc
  • Parameterized environments (lighting conditions, etc), ...

Scenario description files

Parameters, initial conditions, and noise described as exact values or distributions

Switched to a motion planning scheme that’s less sensitive to rack initial position (#2304)

Initial positions of the unmanipulated racks are drawn from MC instead of 0 (#2362)

Finding subtle bugs w/ Monte-Carlo testing

Falsification algorithms

Improve
robustness /
fix bugs

Increase test
randomness /
scope

"Naive" Monte-Carlo can be effective

Potential Sim2Real gaps

  • Video game rendering sufficient for training/fine-tuning computer vision (or that we don't need it!);
  • Video game physics insufficient for fine manipulation.

Most people would say that

Potential Sim2Real gaps

  • Contact modeling
    • Rigid bodies; Soft bodies
  • Robot details
    • reflected inertia, transmission dry friction, time delays...
    • Sensor models (e.g. RGB-D)
    • Controller models / gains
  • System ID (most assets flawed; garbage in \(\Rightarrow\) garbage out)
  • Stochastic rollouts + deterministic replay
    • surprisingly rare in robotics!  (e.g. ROS makes it hard)
  • Simulator bugs

Advanced contact modeling in Drake

Rich collision geometries

My claim: Subtle interactions between the collision and physics engines can cause artificial discontinuities/inaccuracy/instability

(sometimes with dramatic results)

 

Understanding this requires a few steps

  1. Numerical methods must deal with overlapping geometry.
  2. Standard approaches summarize the contact forces / constraints at one or more points.
  3. It is effectively impossible to do this without introducing (potentially severe) discontinuities.

Rich collision geometries

Green arrow is the force on the red box due to the overlap with the blue box.

"Point contact" as implemented in Drake

"Point contact" as implemented in Drake

Multi-point contact

Many heuristics for using multiple points...

"Hydroelastic contact" as implemented in Drake

Point contact vs hydroelastic

Point contact (discontinuous)

Hydroelastic

(continuous)

vs

Hydroelastic is

  • slightly more expensive than point contact
  • (much) less expensive than finite-element models

 

State-space (for simulation, planning, control) is the original rigid-body state.

"Hydroelastic contact" as implemented in Drake

Example: Simulating LEGO® block mating

Manually-curated point contacts

Hydroelastic contact surfaces

Stable and symmetrical hydroelastic forces

Before

Now

Soft simulation in Drake

Soft simulation in Drake

Block tower fall in Bullet Physics

Physics and rendering are not sufficient

Simulation also requires (thoughtful) modeling of

  • Robot controllers/firmware
  • Sensors
  • Sensor noise
  • Perception components
  • Planning components..
  • Time delays
  • ...

Rigorous about randomness

Every source of randomness is declared explicitly (using elementary distributions)

  • Scene (#/type of objects)
  • Parameters/initial conditions
  • Time-varying noise

Reproducible random scenarios

Scene grammars

Structured parameterization over scenes

  • Some parameters control what gets generated
  • Some control pose and shape of what gets generated.

(work by Greg Izatt)

Scene grammars + variational inference

(1) Explain raw data with the model

(2) Fit model parameters to observed worlds

Companies like Waymo have toolchains for going from a log file back to a simulation scenario.

 

Can we generalize / automate that?

System ID

Code quality / correctness

Example: Sim2Real for Euler's disk

Drake is "production ready"

  • Extremely-high code quality / test coverage
  • Monthly releases
  • 3 to 6 month deprecation timelines
  • Aggressive license tracking
  • ...

Already built in production build system at Amazon Robotics (and many others).  

pip install drake
apt install drake

A few comments from last meeting (quotes are approximate)

 Jim: "Should we be using learned models instead of physics engines?"

 Refrain: "How accurate does the simulator need to be?  Isn't domain randomization better?"

What is a (dynamic) model?

System

..., u_{-1}, u_0, u_1, ...
..., y_{-1}, y_0, y_1, ...

State-space

Auto-regressive (eg. ARMAX)

input

output

p(\theta, x_0, w_0, w_1, ...)
x_{n+1} = f(n, x_n, u_n, w_n, \theta) \\ \quad y_n = g(n, x_n, u_n, w_n, \theta)
y_{n+1} = f(n, u_n, u_{n-1}, ..., \\ \qquad \qquad y_n, y_{n-1}, ..., \\ \qquad \qquad w_n, w_{n-1}, ..., \theta)

state

noise/disturbances

parameters

Lagrangian mechanics,

Recurrent neural networks (e.g. LSTM), ...

Feed-forward networks, Transformers

What is a (dynamic) model?

System

..., u_{-1}, u_0, u_1, ...
..., y_{-1}, y_0, y_1, ...

State-space

Auto-regressive (eg. ARMAX)

input

output

x_{n+1} = f(n, x_n, u_n, w_n, \theta) \\ \quad y_n = g(n, x_n, u_n, w_n, \theta)
y_{n+1} = f(n, u_n, u_{n-1}, ..., \\ \qquad \qquad y_n, y_{n-1}, ..., \\ \qquad \qquad w_n, w_{n-1}, ..., \theta)

Model "class" vs model "instance"

  • \(f\) and \(g\) describe the model class.
  • with \(\theta\) describes a model instance.

 

"Deep models vs Physics-based models?" is about model class:

Should we prefer writing \(f\) and \(g\) using physics or deep networks?

Maybe not so different from

  • should we use ReLU or \(\tanh\)?
  • should we use LSTMs or Transformers?
x_{n+1} = f(n, x_n, u_n, w_n, \theta) \\ \quad y_n = g(n, x_n, u_n, w_n, \theta)

Galileo, Kepler, Newton, Hooke, Coulomb, ...

were data scientists.

They fit very simple models to very noisy data.

Gave us a rich class of parametric models that we could fit to new data.

What if Newton had deep learning...?

Galileo's notes on projectile motion

What makes a model class useful?

"All models are wrong, but some are useful" -- George Box

Of course, that depends on your use case...

Use case: Simulation for Benchmarking

What makes a model class useful?

  • Reasonable: \( y_{sim}(u) \in \{ y_{real}(u) \} \)
  • Coverage: \( \forall y_{real}(u), \exists y_{sim}(u) \)
  • Accuracy: \( p(y_{real} | u) \approx p(y_{sim} | u) \)
  • Corollary:  Reliable system identification (data \( \Rightarrow \theta \))
  • Generalizable, efficient, repeatable, interpretable/debuggable, ... 

Benchmarking Dexterous Manipulation

By russtedrake

Benchmarking Dexterous Manipulation

  • 825