iRS-RRT

Pang, Terry, Lu

the Story

Q: how do we solve long horizon, planning through contact problems?

Deep learning is impressive. But learned policies do not generalize to different tasks.
So far, methods based on physics models suffer from the huge number of contact modes.
Contact modes manifest as complementarity constraints in trajectory optimization. They can be mitigated with numerical tricks, but in general are sensitive to initial guesses and cost tuning.
For simpler systems, contact modes can be enumerated. Discrete mode planning and continuous state planning can be interleaved in a sampling-based planning algorithm. But this would not scale to tasks such as 3D dexterous manipulation.

In this work, we propose to replace the reasoning about contact modes with their statistical summary into locally linear systems, which we call bundled dynamics.
We show the power of bundled dynamics with some simple modifications to the standard kino-dynamic RRT.

Reachable Set on Bundled Dynamics

Let's motivate the Gaussian from a different angle.

\begin{aligned} x_{t+1} & = \hat{\mathbf{A}}x_t + \hat{\mathbf{B}} u_t + (\hat{f}(x_t,u_t) - \hat{\mathbf{A}}x_t - \hat{\mathbf{B}}u_t) \\ & = \hat{\mathbf{A}}x_t + \hat{\mathbf{B}} u_t + \hat{c}_t \end{aligned}

Recall the linearization of bundled dynamics around a nominal point.

\begin{aligned} \hat{\mathbf{B}} = \mathbb{E}\bigg[\frac{\partial f}{\partial u}(\bar{x}_t, \bar{u}_t + w)\bigg] \end{aligned}

Fixing the state, we reason about the states that are reachable under the bundled dynamics under some input.

(Image of the input norm-ball under bundle dynamics, assume \bar{u}_t is zero.)

\begin{aligned} \mathcal{S} & = \{\hat{x}_{t+1} | \hat{x}_{t+1} = \hat{\mathbf{B}} u_t + \hat{f}(\bar{x}_t, 0), \|u_t\| \leq \varepsilon\} \\ & = \{\hat{x}_{t+1} | (\hat{x}_{t+1} - \hat{f}(\bar{x}_t, 0))^\intercal \big[\hat{\mathbf{B}}\hat{\mathbf{B}}^\intercal\big]^{-1} (\hat{x}_{t+1} - \hat{f}(\bar{x}_t, 0)) \leq \varepsilon^2\} \end{aligned}

The latter expression becomes the eps-norm ball under Mahalanabis metric.

Distance Metric Based on Local Actuation Matrix

\begin{aligned} \|x\|_{\Sigma^{-1}}^2 & = x^\intercal \big[\mathbf{B}\mathbf{B}^\intercal\big]^{-1} x \end{aligned}

Consider the distance metric:

This naturally gives us distance from one point in state-space another as informed by the actuation

(i.e. 1-step controllability) matrix.

NOTE: This is NOT a symmetric metric for nonlinear dynamical systems. (even better!)

Rows of the B matrix.

\begin{aligned} A \end{aligned}

\begin{aligned} B \end{aligned}

A is harder to reach (has higher distance) than B even if they are the same in Euclidean space.

Handling Singular Cases

\begin{aligned} \text{rank}\big[\mathbf{B}\mathbf{B}^\intercal\big] < n \end{aligned}

What is B is singular, such that

The point that lies along the null-space is not reachable under the current linearization. So distance is infinity!

Numerically, we can choose to "cap" infinity at some finite value by introducing regularization.

Rows of the B matrix.

\begin{aligned} A \end{aligned}

There is loss of 1-step controllability and A is unreachable. Infinite distance.

\begin{aligned} x^\intercal \big[\mathbf{B}\mathbf{B}^\intercal + \lambda \mathbf{I}\big]^{-1} x = \min \big\{x^\intercal \big[\mathbf{B}\mathbf{B}]^{-1} x, \lambda^{-1}\big\} \end{aligned}

Connection to Gaussian Covariance Estimation

We can obtain the covariance matrix of the ellipse directly using gradient information, but we can also consider the zero-order version that solves the least-squares problem:

\begin{aligned} \hat{\mathbf{B}} & = \text{argmin} \sum_i \|\hat{f}(\bar{x}, u_t^i) - \mathbf{B} u_t^i\| \end{aligned}

Under the least-squares solution, one can show a connection between directly doing covariance estimation on samples.

Consider putting the above into data matrix form

\begin{aligned} \hat{\mathbf{B}} & = \text{argmin} \|\mathbf{F} - \mathbf{B}\mathbf{X}\|^2_2 \\ & = \mathbf{V}\Sigma^{-1}\mathbf{U}^\intercal \mathbf{F} \end{aligned}

where X is data matrix of u that is sampled from zero-mean Gaussian and diag. covariance of sigma.

\begin{aligned} \mathbb{E}[\hat{\mathbf{B}}\hat{\mathbf{B}}^\intercal] & = \frac{1}{\sigma^2 (N-1)} \mathbb{E}[\mathbf{F}\mathbf{F}^\intercal] \end{aligned}

How bundled dynamics helps in kino-dynamic RRT

RG-RRT uses local reachability information to guide subgoal sampling towards more reachable part of the smooth space.
- Bundled dynamics provides good reachability info for systems with contact.
By helping with 1 or T-step trajectory optimization in extension (the same story as iRS-MPC).
- There's a trade-off between the number of nodes and the planning horizon T.

Rejection Sampling / Contact Sampling