### russtedrake PRO

Roboticist at MIT and TRI

Russ Tedrake

RSS 2022 Workshop on Differentiable Physics for Robotics

Follow **live** at https://slides.com/d/PcFMXLM/live

(or later at https://slides.com/russtedrake/rss-differentiable)

*Do Differentiable Simulators Give Better Policy Gradients?*

H. J. Terry Suh and Max Simchowitz and Kaiqing Zhang and Russ Tedrake

ICML 2022

Available at: https://arxiv.org/abs/2202.00817

Before we take gradients, let's discuss the *optimization landscape*...

Contact dynamics can lead to **discontinuous**** **landscapes, but mostly in the *corner cases*.

A key question for the success of gradient-based optimization

Use initial conditions here as a surrogate for dependence on policy parameters, etc.; final conditions as surrogate for reward.

For the mathematical model... (ignoring numerical issues)

we *do* expect \(q(t_f) = F\left(q(t_0)\right)\) to be continuous.

- Contact time, pre-/post-contact pos/vel
*all vary continuously*. - Simulators will have artifacts from making discrete-time approximations; these
*can*be made small (but often aren't)

point contact on half-plane

We have "real" discontinuities at the corner cases

- making contact w/ a different face
- transitions to/from contact and no contact

Soft/compliant contact can replace discontinuities with stiff approximations

\[ \min_x f(x) \]

For gradient descent, discontinuities / non-smoothness can

- introduce local minima
- destroy convergence (e.g. \(l_1\)-minimization)

- A natural idea: can we smooth the objective?

- Probabilistic formulation, for small \(\Sigma\): \[ \min_x f(x) \approx \min_\mu E \left[ f(x) \right], x \sim \mathcal{N}(\mu, \Sigma) \]

- A low-pass filter in parameter space with a Gaussian kernel.

- Smooth local minima
- Alleviate flat regions
- Encode robustness

\begin{gathered}
\min_\theta f(\theta)
\end{gathered}

\begin{gathered}
\min_\theta E_w\left[ f(\theta, w) \right] \\
w \sim N(0, \Sigma)
\end{gathered}

vs

In reinforcement learning (RL) and "deep" model-predictive control, we add stochasticity via

- Stochastic policies
- Random initial conditions
- "Domain randomization"

then optimize a *stochastic optimal control* objective (e.g. maximize expected reward)

These can all *smooth* the optimization landscape.

The answer is subtle; the Heaviside example might shed some light.

\begin{gathered}
\min_\theta f(\theta)
\end{gathered}

\begin{gathered}
\min_\theta E_w\left[ f(\theta, w) \right] \\
w \sim N(0, \Sigma)
\end{gathered}

vs

Differentiable simulators give \(\frac{\partial f}{\partial \theta}\), but we want \(\frac{\partial}{\partial \theta} E_w[f(\theta, w)]\).

J. Burke, F. E. Curtis, A. Lewis, M. Overton, and L. Simoes, *Gradient Sampling Methods for Nonsmooth Optimization*, 02 2020, pp. 201–225.

- Approximate smoothed objective via Monte-carlo : \[ E_\mu \left[ f(x) \right] \approx \frac{1}{K} \sum_{i=1}^K f(x_i), \quad x_i \sim \mathcal{N}(\mu, \Sigma) \]
- First-order gradient estimate \[ \frac{\partial}{\partial \mu} E_\mu \left[ f(x) \right] \approx \frac{1}{K} \sum_{i=1}^K \frac{\partial f(\mu + w_i)}{\partial \mu}, \quad w_i \sim \mathcal{N}(0, \Sigma) \]

- Zero-order gradient estimate (aka REINFORCE) \[ \frac{\partial}{\partial \mu} E_\mu \left[ f(x) \right] \approx \frac{1}{K} \sum_{i=1}^K \left[f(\mu + w_i) - f(\mu)\right] w_i, \quad w_i \sim \mathcal{N}(0, \Sigma) \]

- The two gradient estimates converge to the same quantity under sufficient regularity conditions.

- Convergence rate scales directly with variance of the estimators, zero-order often has higher variance.

*But the regularity conditions aren't met in contact discontinuities, leading to a biased first-order estimator.*

*Often, but not always.*

\(\frac{\partial f(x)}{\partial x} = 0\) almost everywhere!

\( \Rightarrow \frac{1}{K} \sum_{i=1}^K \frac{\partial f(\mu + w_i)}{\partial \mu} = 0 \)

First-order estimator is biased

\( \not\approx \frac{\partial}{\partial \mu} E_\mu [f(x)] \)

Zero-order estimator is (still) unbiased

- Continuous yet stiff approximations look like strict discontinuities in the finite-sample regime.
- In the paper, we formalize "
*empirical bias*" to capture this.

e.g. with stiff contact models (large gradient \(\Rightarrow\) high variance)

First-order estimators are *often *lower variance that zero-order estimators. But they have some pathologies:

- Bias/empirical bias around (near) discontinuities
- High variance from stiffness

Zero-order estimators are robust in these regimes.

This may explain the experimental success of zero-order methods in contact-rich RL.

Define *\(\alpha\)-order* gradient estimate as

\begin{gathered} \bar\nabla^\alpha F(x) = \alpha \underbrace{\bar\nabla^1 F(x)} + (1-\alpha) \underbrace{\bar\nabla^0 F(x)} , \qquad 0 \le \alpha \le 1 \end{gathered}

first-order estimate

zero-order estimate

We give an algorithm to choose \(\alpha\) automatically based on the *empirical variance*

(+ a trust region using empirical bias).

Smoothing of time-stepping contact model

*Global Planning for Contact-Rich Manipulation via
Local Smoothing of Quasi-dynamic Contact Models*

Tao Pang, H. J. Terry Suh, Lujie Yang, and Russ Tedrake

Available at: https://arxiv.org/abs/2206.10787

Establish equivalence between randomized smoothing and a (deterministic/differentiable) force-at-a-distance contact model.

x_{n+1} = f(x_n, u_n) \approx A(x-x_0) + B(u-u_0) + c \\
\quad A = \frac{\partial f}{\partial x},\quad B = \frac{\partial f}{\partial u},\quad c = f(x_0, u_0)

Gradients

E_{w \sim \rho}[f(x+w_x, u+w_u)] \\
\quad A_\rho = \frac{\partial}{\partial x} E_{w \sim \rho}[f(x+w_x, u+w_u)],

Smoothed gradients (under distribution \(\rho\))

\quad B_\rho = \frac{\partial}{\partial u}E_{w \sim \rho}[f(x+w_x, u+w_u)],\\ \quad c_\rho = E_{w \sim \rho}[f(x_0+w_x, u_0+w_u)]

- \( x_{n+1} = f(x_n, u_n) \) : the solution of an optimization under contact complementarity constraints.

- Relaxed problem: move hard complementarity constraints into objective (e.g. via log barrier penalty term)
- Results in force at a distance
- For simple problems, we show that each barrier function corresponds with a choice of \( \rho \) and vice versa.

- RRT distance metric / Trajectory optimization using (differentiable) quasi-dynamic model.

- RL uses stochastic optimal control / smooths discontinuities
- Here we need \( \frac{\partial}{\partial \theta} E_w[f(\theta, w)] \), not just \(\frac{\partial f}{\partial \theta}\)
- First-order estimators have some pathologies with stiffness/discontinuities; zero-order is robust.
- \(\alpha\)-order estimator can achieve faster convergence + robust performance.

- Examining smoothing for simple systems reveals a deterministic equivalent (e.g. force at a distance)
- Now \(\frac{\partial f}{\partial \theta}\) is all you need
- Enabled RRT / trajectory opt. for dexterous hands

*Do Differentiable Simulators Give Better Policy Gradients?*

H. J. Terry Suh and Max Simchowitz and Kaiqing Zhang and Russ Tedrake

ICML 2022

Available at: https://arxiv.org/abs/2202.00817

*Global Planning for Contact-Rich Manipulation via
Local Smoothing of Quasi-dynamic Contact Models*

Tao Pang, H. J. Terry Suh, Lujie Yang, and Russ Tedrake

Available at: https://arxiv.org/abs/2206.10787

*Blog post on "hydroelastic" contact modeling in Drake*

Available on Medium.

*My claim*: Subtle interactions between the collision and physics engines can cause artificial discontinuities

(sometimes with dramatic results)

Understanding this requires a few steps

- Numerical methods must deal with overlapping geometry.
- Standard approaches summarize the contact forces / constraints at one or more points.
- It is effectively impossible to do this without introducing (potentially severe) discontinuities.

Green arrow is the force on the red box due to the overlap with the blue box.

Many heuristics for using multiple points...

major contributions from Damrong Guoy, Sean Curtis, Rick Cory, Alejandro Castro, ...

Red box is rigid, blue box is soft.

Both boxes are soft.

Point contact (discontinuous)

Hydroelastic

(continuous)

vs

Hydroelastic is

- more expensive than point contact
- (much) less expensive than finite-element models

State-space (for simulation, planning, control) is the original rigid-body state.

Point contact and multi-point contact can produce qualitatively wrong behavior.

Hydroelastic often resolves it.

Manually-curated point contacts

Hydroelastic contact surfaces

Stable and symmetrical hydroelastic forces

Before

Now

Text

Point contact

Hydroelastic contact

the frictionless case

Point contact (no friction)

Hydroelastic

(no friction)

By russtedrake

RSS 2022 Workshop on Differentiable Physics for Robotics

- 751