Thesis Defense (XX)

H.J. Terry Suh, MIT

Leveraging Structure for Efficient and Dexterous Contact-Rich Manipulation

[TRO 2023, TRO 2025]

Part 3. Global Planning for Contact-Rich Manipulation

Part 2. Local Planning / Control via Dynamic Smoothing

Part 1. Understanding RL with Randomized Smoothing

Introduction

[TRO 2023, TRO 2025]

Introduction

Why contact-rich manipulation?

What has been done, and what's lacking?

Problems with Mode Enumeration

System

Number of Modes

\begin{aligned} N = 2 \end{aligned}

\begin{aligned} N = 2 \end{aligned}

\begin{aligned} N = 3^{\binom{9}{2}} \end{aligned}

\begin{aligned} N = 3^{\binom{9}{2}} \end{aligned}

No Contact

Sticking Contact

Sliding Contact

Number of potential active contacts

Problems with Mode Enumeration

System

Number of Modes

\begin{aligned} N = 3^{\binom{20}{2}} \approx 4.5 \times 10^{90} \end{aligned}

\begin{aligned} N = 3^{\binom{20}{2}} \approx 4.5 \times 10^{90} \end{aligned}

The number of modes scales terribly with system complexity

\begin{aligned} N = 2 \end{aligned}

\begin{aligned} N = 2 \end{aligned}

\begin{aligned} N = 3^{\binom{9}{2}} \end{aligned}

\begin{aligned} N = 3^{\binom{9}{2}} \end{aligned}

No Contact

Sticking Contact

Sliding Contact

Number of potential active contacts

How does RL search through these difficult problems?

Original Problem

\begin{aligned} \min_x F(x) \end{aligned}

\begin{aligned} \min_x F(x) \end{aligned}

[ICML 2022]

Cost

\begin{aligned} F(x) \end{aligned}

\begin{aligned} F(x) \end{aligned}

\begin{aligned} F(x) \end{aligned}

\begin{aligned} F(x) \end{aligned}

\begin{aligned} F(x) \end{aligned}

\begin{aligned} F(x) \end{aligned}

How does RL search through these difficult problems?

Randomized Smoothing

\begin{aligned} \min_x \mathbb{E}_w F(x+w) \end{aligned}

\begin{aligned} \min_x \mathbb{E}_w F(x+w) \end{aligned}

[ICML 2022]

\begin{aligned} F(x) \end{aligned}

\begin{aligned} F(x) \end{aligned}

\begin{aligned} F(x) \end{aligned}

\begin{aligned} F(x) \end{aligned}

\begin{aligned} F(x) \end{aligned}

\begin{aligned} F(x) \end{aligned}

\begin{aligned} \mathbb{E}_wF(x+w) \end{aligned}

\begin{aligned} \mathbb{E}_wF(x+w) \end{aligned}

\begin{aligned} \mathbb{E}_wF(x+w) \end{aligned}

\begin{aligned} \mathbb{E}_wF(x+w) \end{aligned}

\begin{aligned} \mathbb{E}_wF(x+w) \end{aligned}

\begin{aligned} \mathbb{E}_wF(x+w) \end{aligned}

How does RL search through these difficult problems?

Randomized Smoothing

\begin{aligned} \min_x \mathbb{E}_w F(x+w) \end{aligned}

\begin{aligned} \min_x \mathbb{E}_w F(x+w) \end{aligned}

[ICML 2022]

\begin{aligned} F(x) \end{aligned}

\begin{aligned} F(x) \end{aligned}

\begin{aligned} F(x) \end{aligned}

\begin{aligned} F(x) \end{aligned}

\begin{aligned} F(x) \end{aligned}

\begin{aligned} F(x) \end{aligned}

\begin{aligned} \mathbb{E}_wF(x+w) \end{aligned}

\begin{aligned} \mathbb{E}_wF(x+w) \end{aligned}

\begin{aligned} \mathbb{E}_wF(x+w) \end{aligned}

\begin{aligned} \mathbb{E}_wF(x+w) \end{aligned}

\begin{aligned} \mathbb{E}_wF(x+w) \end{aligned}

\begin{aligned} \mathbb{E}_wF(x+w) \end{aligned}

Noise regularizes difficult landscapes,

allevates flatness and stiffness,

abstracts contact modes.

How does RL search through these difficult problems?

Randomized Smoothing

\begin{aligned} \min_x \mathbb{E}_w F(x+w) \end{aligned}

\begin{aligned} \min_x \mathbb{E}_w F(x+w) \end{aligned}

[ICML 2022]

\begin{aligned} F(x) \end{aligned}

\begin{aligned} F(x) \end{aligned}

\begin{aligned} F(x) \end{aligned}

\begin{aligned} F(x) \end{aligned}

\begin{aligned} F(x) \end{aligned}

\begin{aligned} F(x) \end{aligned}

\begin{aligned} \mathbb{E}_wF(x+w) \end{aligned}

\begin{aligned} \mathbb{E}_wF(x+w) \end{aligned}

\begin{aligned} \mathbb{E}_wF(x+w) \end{aligned}

\begin{aligned} \mathbb{E}_wF(x+w) \end{aligned}

\begin{aligned} \mathbb{E}_wF(x+w) \end{aligned}

\begin{aligned} \mathbb{E}_wF(x+w) \end{aligned}

But how do we take gradients of a stochastic function?

\nabla_x \mathbb{E}_w F(x+w)

\nabla_x \mathbb{E}_w F(x+w)

Estimation of Gradients with Monte Carlo

\begin{aligned} & \nabla_x\mathbb{E}_{w\sim\rho}F(x+{w}) \\ = & \min_{\mathbf{J},\mu}\mathbb{E}_{w\sim\rho}\|F(x + w) - (\mathbf{J}w +\mu)\|^2 \\ \approx & \frac{1}{N}\sum^N_{i=1} \|F(x + w_i) - (\mathbf{J}w_i +\mu)\|^2 \end{aligned}

\begin{aligned} & \nabla_x\mathbb{E}_{w\sim\rho}F(x+{w}) \\ = & \min_{\mathbf{J},\mu}\mathbb{E}_{w\sim\rho}\|F(x + w) - (\mathbf{J}w +\mu)\|^2 \\ \approx & \frac{1}{N}\sum^N_{i=1} \|F(x + w_i) - (\mathbf{J}w_i +\mu)\|^2 \end{aligned}

Zeroth-Order Gradient Estimator

Stein Gradient Estimator
REINFORCE
Score Function / Likelihood Ratio Estimator

Estimation of Gradients with Monte Carlo

\begin{aligned} & \nabla_x\mathbb{E}_{w\sim\rho}F(x+{w}) \\ = & \min_{\mathbf{J},\mu}\mathbb{E}_{w\sim\rho}\|F(x + w) - (\mathbf{J}w +\mu)\|^2 \\ \approx & \frac{1}{N}\sum^N_{i=1} \|F(x + w_i) - (\mathbf{J}w_i +\mu)\|^2 \end{aligned}

Zeroth-Order Gradient Estimator

Stein Gradient Estimator
REINFORCE
Score Function / Likelihood Ratio Estimator

Estimation of Gradients with Monte Carlo

\begin{aligned} & \nabla_x\mathbb{E}_{w\sim\rho}F(x+{w}) \\ = & \min_{\mathbf{J},\mu}\mathbb{E}_{w\sim\rho}\|F(x + w) - (\mathbf{J}w +\mu)\|^2 \\ \approx & \frac{1}{N}\sum^N_{i=1} \|F(x + w_i) - (\mathbf{J}w_i +\mu)\|^2 \end{aligned}

Zeroth-Order Gradient Estimator

Stein Gradient Estimator
REINFORCE
Score Function / Likelihood Ratio Estimator

Estimation of Gradients with Monte Carlo

\begin{aligned} & \nabla_x\mathbb{E}_{w\sim\rho}F(x+{w}) \\ = & \min_{\mathbf{J},\mu}\mathbb{E}_{w\sim\rho}\|F(x + w) - (\mathbf{J}w +\mu)\|^2 \\ \approx & \frac{1}{N}\sum^N_{i=1} \|F(x + w_i) - (\mathbf{J}w_i +\mu)\|^2 \end{aligned}

Zeroth-Order Gradient Estimator

Stein Gradient Estimator
REINFORCE
Score Function / Likelihood Ratio Estimator

But what if we had access to gradients?

Leveraging Differentiable Physics

First-Order Randomized Smoothing

\begin{aligned} & \frac{\partial}{\partial x}\mathbb{E}_{w\sim\rho}F(x+{w}) \\ = & \mathbb{E}_{w\sim\rho}\frac{\partial F}{\partial x}(x + w) \\ \approx & \frac{1}{N}\sum^N_{i=1} \frac{\partial F}{\partial x}(x + w_i) \end{aligned}

\begin{aligned} & \frac{\partial}{\partial x}\mathbb{E}_{w\sim\rho}F(x+{w}) \\ = & \mathbb{E}_{w\sim\rho}\frac{\partial F}{\partial x}(x + w) \\ \approx & \frac{1}{N}\sum^N_{i=1} \frac{\partial F}{\partial x}(x + w_i) \end{aligned}

Gradient Sampling Algorithm

From John Duchi's Slides on Randomized Smoothing, 2014

Leveraging Differentiable Physics

First-Order Randomized Smoothing

\begin{aligned} & \frac{\partial}{\partial x}\mathbb{E}_{w\sim\rho}F(x+{w}) \\ = & \mathbb{E}_{w\sim\rho}\frac{\partial F}{\partial x}(x + w) \\ \approx & \frac{1}{N}\sum^N_{i=1} \frac{\partial F}{\partial x}(x + w_i) \end{aligned}

Gradient Sampling Algorithm

From John Duchi's Slides on Randomized Smoothing, 2014

So which one should we use?

Comparison of Efficiency

Analytic Expression

First-Order Gradient Estimator

Zeroth-Order Gradient Estimator

Requires differentiability over dynamics
Generally lower variance.

Least requirements (blackbox)
High variance.

Possible for only few cases

Structure

Efficiency

\nabla_x \mathbb{E}_w F(x+w)

\nabla_x \mathbb{E}_w F(x+w)

Comparison of Efficiency

Analytic Expression

First-Order Gradient Estimator

Zeroth-Order Gradient Estimator

Requires differentiability over dynamics
Generally lower variance.

Least requirements (blackbox)
High variance.

Possible for only few cases

Structure

Efficiency

\nabla_x \mathbb{E}_w F(x+w)

\nabla_x \mathbb{E}_w F(x+w)

Can we transfer this promise to RL?

\begin{aligned} \nabla_x \mathbb{E}_w F(x + w) & = \mathbb{E}_w F(x + w) = \frac{1}{N}\sum^N_{i=1}\nabla_x F(x + w_i) \end{aligned}

\begin{aligned} \nabla_x \mathbb{E}_w F(x + w) & = \mathbb{E}_w F(x + w) = \frac{1}{N}\sum^N_{i=1}\nabla_x F(x + w_i) \end{aligned}

Bias of First-Order Estimation

Utilizing More Structure

\lambda = k\min(\phi,0) + d\min(\phi,0)v

\lambda = k\min(\phi,0) + d\min(\phi,0)v

\phi

\phi

- No force when not in contact

- Spring-damper behavior when in contact

Tackling stiffness requires us to rethink about the models we use to simulate contact

CQDC: A Quasi-dynamic Simulator for Manipulation

Dr. Tao Pang

Robot commands as

State only comprises of configurations

- Actuated configurations (Robot)

- Unactuated configurations (Object)

q^\mathrm{u}

q^\mathrm{u}

q^\mathrm{a}

q^\mathrm{a}

- Position command to an stiffness controller

u

Convex

Quasidynamic

Differentiable

Contact Model

CQDC: A Quasi-dynamic Simulator for Manipulation

\begin{aligned} \min_{{\color{red}q^\mathrm{u}_+}} &\;\;\frac{1}{2}({\color{red}q^\mathrm{u}_+} - q^\mathrm{u})^\top \mathbf{M}_\mathrm{u} ({\color{red}q^\mathrm{u}_+} - q^\mathrm{u}) \\ \text{s.t.} & \;\; \phi(q^\mathrm{u},{\color{blue}q^\mathrm{a}_\text{cmd}}) \geq 0 \end{aligned}

\begin{aligned} \min_{{\color{red}q^\mathrm{u}_+}} &\;\;\frac{1}{2}({\color{red}q^\mathrm{u}_+} - q^\mathrm{u})^\top \mathbf{M}_\mathrm{u} ({\color{red}q^\mathrm{u}_+} - q^\mathrm{u}) \\ \text{s.t.} & \;\; \phi(q^\mathrm{u},{\color{blue}q^\mathrm{a}_\text{cmd}}) \geq 0 \end{aligned}

\begin{aligned} q^\mathrm{u} \end{aligned}

\begin{aligned} q^\mathrm{u} \end{aligned}

\begin{aligned} {\color{red}q^\mathrm{u}_+} \end{aligned}

\begin{aligned} {\color{red}q^\mathrm{u}_+} \end{aligned}

Non-Penetration

Minimum Energy Principle

CQDC: A Quasi-dynamic Simulator for Manipulation

\begin{aligned} \min_{q+} \quad \frac{1}{2}q_+^\top \mathbf{P} q_+ + b^\top q \\ \text{subject to} \quad \mathbf{J}_i q_+ + c_i \in \mathcal{K}_i \end{aligned}

\begin{aligned} \min_{q+} \quad \frac{1}{2}q_+^\top \mathbf{P} q_+ + b^\top q \\ \text{subject to} \quad \mathbf{J}_i q_+ + c_i \in \mathcal{K}_i \end{aligned}

Second-Order Cone Program (SOCP)

We can use standard SOCP solvers to solve this program

Sensitivity Analysis

Spring-damper modeling

Quasi-static Dynamics

\begin{aligned} \frac{\partial x_T}{\partial x_0}=\frac{\partial x_T}{\partial x_{T-1}}\cdots \frac{\partial x_1}{\partial x_0} \end{aligned}

\begin{aligned} \frac{\partial x_T}{\partial x_0}=\frac{\partial x_T}{\partial x_{T-1}}\cdots \frac{\partial x_1}{\partial x_0} \end{aligned}

\begin{aligned} \frac{\partial x_T}{\partial x_0} \end{aligned}

\begin{aligned} \frac{\partial x_T}{\partial x_0} \end{aligned}

Directly obtained by sensitivity analysis

Jumping from one equilibrium to another lets us compute long horizon gradients temporally, allowing us to be less stiff.

Figures from Tao Pang's Defense, 2023

Constrained Optimization

\begin{aligned} \min_{{\color{red}q^\mathrm{u}_+}} &\;\;\frac{1}{2}({\color{red}q^\mathrm{u}_+} - q^\mathrm{u})^\top \mathbf{M}_\mathrm{u} ({\color{red}q^\mathrm{u}_+} - q^\mathrm{u})\;\;\text{s.t.} \;\; \phi(q^\mathrm{u},{\color{blue}q^\mathrm{a}_\text{cmd}}) \geq 0 \end{aligned}

\begin{aligned} \min_{{\color{red}q^\mathrm{u}_+}} &\;\;\frac{1}{2}({\color{red}q^\mathrm{u}_+} - q^\mathrm{u})^\top \mathbf{M}_\mathrm{u} ({\color{red}q^\mathrm{u}_+} - q^\mathrm{u})\;\;\text{s.t.} \;\; \phi(q^\mathrm{u},{\color{blue}q^\mathrm{a}_\text{cmd}}) \geq 0 \end{aligned}

Constrained Optimization

\begin{aligned} \min_{{\color{red}q^\mathrm{u}_+}} &\;\;\frac{1}{2}({\color{red}q^\mathrm{u}_+} - q^\mathrm{u})^\top \mathbf{M}_\mathrm{u} ({\color{red}q^\mathrm{u}_+} - q^\mathrm{u})\;\;\text{s.t.} \;\; \phi(q^\mathrm{u},{\color{blue}q^\mathrm{a}_\text{cmd}}) \geq 0 \end{aligned}

\begin{aligned} \min_{{\color{red}q^\mathrm{u}_+}} &\;\;\frac{1}{2}({\color{red}q^\mathrm{u}_+} - q^\mathrm{u})^\top \mathbf{M}_\mathrm{u} ({\color{red}q^\mathrm{u}_+} - q^\mathrm{u})\;\;\text{s.t.} \;\; \phi(q^\mathrm{u},{\color{blue}q^\mathrm{a}_\text{cmd}}) \geq 0 \end{aligned}

=

+

\infty

\infty

\infty

\infty

0

Log-Barrier Relaxation

=

+

\infty

\infty

\infty

\infty

\begin{aligned} \min_{{\color{red}q^\mathrm{u}_+}} &\;\;\frac{1}{2}({\color{red}q^\mathrm{u}_+} - q^\mathrm{u})^\top \mathbf{M}_\mathrm{u} ({\color{red}q^\mathrm{u}_+} - q^\mathrm{u}) -\log \phi(q^\mathrm{u},{\color{blue}q^\mathrm{a}_\text{cmd}}) \end{aligned}

\begin{aligned} \min_{{\color{red}q^\mathrm{u}_+}} &\;\;\frac{1}{2}({\color{red}q^\mathrm{u}_+} - q^\mathrm{u})^\top \mathbf{M}_\mathrm{u} ({\color{red}q^\mathrm{u}_+} - q^\mathrm{u}) -\log \phi(q^\mathrm{u},{\color{blue}q^\mathrm{a}_\text{cmd}}) \end{aligned}

Log-Barrier Relaxation

=

+

\infty

\infty

\infty

\infty

Constraints have an inversely proportional effect to distance.

\begin{aligned} \min_{{\color{red}q^\mathrm{u}_+}} &\;\;\frac{1}{2}({\color{red}q^\mathrm{u}_+} - q^\mathrm{u})^\top \mathbf{M}_\mathrm{u} ({\color{red}q^\mathrm{u}_+} - q^\mathrm{u}) -\log \phi(q^\mathrm{u},{\color{blue}q^\mathrm{a}_\text{cmd}}) \end{aligned}

\begin{aligned} \min_{{\color{red}q^\mathrm{u}_+}} &\;\;\frac{1}{2}({\color{red}q^\mathrm{u}_+} - q^\mathrm{u})^\top \mathbf{M}_\mathrm{u} ({\color{red}q^\mathrm{u}_+} - q^\mathrm{u}) -\log \phi(q^\mathrm{u},{\color{blue}q^\mathrm{a}_\text{cmd}}) \end{aligned}

Benefits of Smoothing

[ICML 2022]

\begin{aligned} F(x) \end{aligned}

\begin{aligned} F(x) \end{aligned}

\begin{aligned} F(x) \end{aligned}

\begin{aligned} F(x) \end{aligned}

\begin{aligned} F(x) \end{aligned}

\begin{aligned} F(x) \end{aligned}

\begin{aligned} \mathbb{E}_wF(x+w) \end{aligned}

\begin{aligned} \mathbb{E}_wF(x+w) \end{aligned}

\begin{aligned} \mathbb{E}_wF(x+w) \end{aligned}

\begin{aligned} \mathbb{E}_wF(x+w) \end{aligned}

\begin{aligned} \mathbb{E}_wF(x+w) \end{aligned}

\begin{aligned} \mathbb{E}_wF(x+w) \end{aligned}

Recall smoothing achieves abstraction of contact modes

Equivalence of Smoothing Schemes

\rho(w) = \sqrt{\frac{4 c}{(w^\top c w + 4)^3}}

\rho(w) = \sqrt{\frac{4 c}{(w^\top c w + 4)^3}}

The two smoothing schemes are equivalent!

(There is a distribution that corresponds to barrier smoothing)

Randomized Smoothing

Barrier Smoothing

Back to Dynamics

\begin{aligned} q^\mathrm{u}_{next} = f(q,u) \end{aligned}

\begin{aligned} q^\mathrm{u}_{next} = f(q,u) \end{aligned}

True Dynamics

How do we build a computationally tractable local model for iterative optimization?

True Dynamics

Back to Dynamics

\begin{aligned} q^\mathrm{u}_{next} = \frac{\partial f}{\partial u}\delta u + f(q,u) \end{aligned}

\begin{aligned} q^\mathrm{u}_{next} = \frac{\partial f}{\partial u}\delta u + f(q,u) \end{aligned}

Is this a good local model?

Local Model

First-Order Taylor Approximation

Gradients too myopic, suffers from flatness

Back to Dynamics

Local Model

Is this a good local model?

Gradients are more useful due to smoothing

First-Order Taylor Approximation

on Smoothed Dynamics

\begin{aligned} q^\mathrm{u}_{next} = \frac{\partial f_{\color{red}\rho}}{\partial u}\delta u + f_{\color{red}\rho}(q,u) \end{aligned}

\begin{aligned} q^\mathrm{u}_{next} = \frac{\partial f_{\color{red}\rho}}{\partial u}\delta u + f_{\color{red}\rho}(q,u) \end{aligned}

Back to Dynamics

Local Model

Is this a good local model?

First-Order Taylor Approximation

on Smoothed Dynamics

Gradients are more useful due to smoothing

Still violates some fundamental characteristics of contact!

\begin{aligned} q^\mathrm{u}_{next} = \frac{\partial f_{\color{red}\rho}}{\partial u}\delta u + f_{\color{red}\rho}(q,u) \end{aligned}

\begin{aligned} q^\mathrm{u}_{next} = \frac{\partial f_{\color{red}\rho}}{\partial u}\delta u + f_{\color{red}\rho}(q,u) \end{aligned}

\begin{aligned} {\color{blue}q^\mathrm{u}_{next}} = f(q, u + \delta u) \end{aligned}

\begin{aligned} {\color{blue}q^\mathrm{u}_{next}} = f(q, u + \delta u) \end{aligned}

Dynamics

\begin{aligned} \|{\color{green}\delta u}\|_2\leq \varepsilon \end{aligned}

\begin{aligned} \|{\color{green}\delta u}\|_2\leq \varepsilon \end{aligned}

What does this imply about reachable sets?

\begin{aligned} {\color{blue}q^\mathrm{u}_{next}} = \frac{\partial f_{\rho}}{\partial u}{\color{green}\delta u} + f_{\rho}(q,u) \end{aligned}

\begin{aligned} {\color{blue}q^\mathrm{u}_{next}} = \frac{\partial f_{\rho}}{\partial u}{\color{green}\delta u} + f_{\rho}(q,u) \end{aligned}

Linear Map

\begin{aligned} \|{\color{green}\delta u}\|_2\leq \varepsilon \end{aligned}

\begin{aligned} \|{\color{green}\delta u}\|_2\leq \varepsilon \end{aligned}

What does this imply about reachable sets?

\begin{aligned} {\color{blue}q^\mathrm{u}_{next}} = \frac{\partial f_{\rho}}{\partial u}{\color{green}\delta u} + f_{\rho}(q,u) \end{aligned}

\begin{aligned} {\color{blue}q^\mathrm{u}_{next}} = \frac{\partial f_{\rho}}{\partial u}{\color{green}\delta u} + f_{\rho}(q,u) \end{aligned}

Linear Map

\begin{aligned} {\color{blue}q^\mathrm{u}_{next}} \end{aligned}

\begin{aligned} {\color{blue}q^\mathrm{u}_{next}} \end{aligned}

We can make ellipsoidal approximations of the reachable set

\begin{aligned} \|{\color{green}\delta u}\|_2\leq \varepsilon \end{aligned}

\begin{aligned} \|{\color{green}\delta u}\|_2\leq \varepsilon \end{aligned}

We can make ellipsoidal approximations of the reachable set

True Samples

Samples from Local Model

\begin{aligned} {\color{blue}q^\mathrm{u}_{next}} = \frac{\partial f_{\rho}}{\partial u}{\color{green}\delta u} + f_{\rho}(q,u) \end{aligned}

\begin{aligned} {\color{blue}q^\mathrm{u}_{next}} = \frac{\partial f_{\rho}}{\partial u}{\color{green}\delta u} + f_{\rho}(q,u) \end{aligned}

What does this imply about reachable sets?

Linear Map

True Samples

Samples from Local Model

What does this imply about reachable sets?

\begin{aligned} {\color{blue}q^\mathrm{u}_{next}} = \frac{\partial f_{\rho}}{\partial u}{\color{green}\delta u} + f_{\rho}(q,u) \end{aligned}

\begin{aligned} {\color{blue}q^\mathrm{u}_{next}} = \frac{\partial f_{\rho}}{\partial u}{\color{green}\delta u} + f_{\rho}(q,u) \end{aligned}

Linear Map

What are we missing here?

True Samples

Samples from Local Model

What does this imply about reachable sets?

\begin{aligned} {\color{blue}q^\mathrm{u}_{next}} = \frac{\partial f_{\rho}}{\partial u}{\color{green}\delta u} + f_{\rho}(q,u) \end{aligned}

\begin{aligned} {\color{blue}q^\mathrm{u}_{next}} = \frac{\partial f_{\rho}}{\partial u}{\color{green}\delta u} + f_{\rho}(q,u) \end{aligned}

Linear Map

Qualification on Contact Impulses

\begin{aligned} {\color{blue}q^\mathrm{u}_{next}} = \frac{\partial f_{\rho}}{\partial u}{\color{green}\delta u} + f_{\rho}(q,u) \end{aligned}

\begin{aligned} {\color{blue}q^\mathrm{u}_{next}} = \frac{\partial f_{\rho}}{\partial u}{\color{green}\delta u} + f_{\rho}(q,u) \end{aligned}

Dynamics (Primal)

\begin{aligned} {\color{blue}q^\mathrm{u}_{next}} \end{aligned}

\begin{aligned} {\color{blue}q^\mathrm{u}_{next}} \end{aligned}

\begin{aligned} {\color{magenta}\lambda} = \frac{\partial \lambda_{\rho}}{\partial u}{\color{green}\delta u} + \lambda_{\rho}(q,u) \end{aligned}

\begin{aligned} {\color{magenta}\lambda} = \frac{\partial \lambda_{\rho}}{\partial u}{\color{green}\delta u} + \lambda_{\rho}(q,u) \end{aligned}

Contact Impulses (Dual)

\begin{aligned} {\color{magenta}\lambda} \end{aligned}

\begin{aligned} {\color{magenta}\lambda} \end{aligned}

\begin{aligned} \|{\color{green}\delta u}\|_2\leq \varepsilon \end{aligned}

\begin{aligned} \|{\color{green}\delta u}\|_2\leq \varepsilon \end{aligned}

Qualification on Contact Impulses

\begin{aligned} {\color{blue}q^\mathrm{u}_{next}} = \frac{\partial f_{\rho}}{\partial u}{\color{green}\delta u} + f_{\rho}(q,u) \end{aligned}

\begin{aligned} {\color{blue}q^\mathrm{u}_{next}} = \frac{\partial f_{\rho}}{\partial u}{\color{green}\delta u} + f_{\rho}(q,u) \end{aligned}

Dynamics (Primal)

\begin{aligned} {\color{blue}q^\mathrm{u}_{next}} \end{aligned}

\begin{aligned} {\color{blue}q^\mathrm{u}_{next}} \end{aligned}

\begin{aligned} {\color{magenta}\lambda} = \frac{\partial \lambda_{\rho}}{\partial u}{\color{green}\delta u} + \lambda_{\rho}(q,u) \end{aligned}

\begin{aligned} {\color{magenta}\lambda} = \frac{\partial \lambda_{\rho}}{\partial u}{\color{green}\delta u} + \lambda_{\rho}(q,u) \end{aligned}

Contact Impulses (Dual)

\begin{aligned} {\color{magenta}\lambda} \end{aligned}

\begin{aligned} {\color{magenta}\lambda} \end{aligned}

*image taken from Stephane Caron's blog

\begin{aligned} {\color{magenta}\lambda}\in \{\mu\lambda_n \geq \|\lambda_t\|_2\} \end{aligned}

\begin{aligned} {\color{magenta}\lambda}\in \{\mu\lambda_n \geq \|\lambda_t\|_2\} \end{aligned}

Contact is unilateral

\begin{aligned} \|{\color{green}\delta u}\|_2\leq \varepsilon \end{aligned}

\begin{aligned} \|{\color{green}\delta u}\|_2\leq \varepsilon \end{aligned}

Feasible Trust Region

Dynamics (Primal)

Contact Impulses (Dual)

\begin{aligned} {\color{magenta}\lambda}\in \{\mu\lambda_n \geq \|\lambda_t\|_2\} \end{aligned}

\begin{aligned} {\color{magenta}\lambda}\in \{\mu\lambda_n \geq \|\lambda_t\|_2\} \end{aligned}

Trust Region Size

\begin{aligned} \|{\color{green}(\delta q, \delta u)}\|_2\leq \varepsilon \end{aligned}

\begin{aligned} \|{\color{green}(\delta q, \delta u)}\|_2\leq \varepsilon \end{aligned}

\begin{aligned} \color{green} \mathcal{T}(q, u) =\{(\delta q, \delta u)\} \end{aligned}

\begin{aligned} \color{green} \mathcal{T}(q, u) =\{(\delta q, \delta u)\} \end{aligned}

Feasible Trust Region

\begin{aligned} {\color{blue}q^\mathrm{u}_{next}} = \frac{\partial f_{\rho}}{\partial u}{\color{green}\delta u} + f_{\rho}(q,u) \end{aligned}

\begin{aligned} {\color{blue}q^\mathrm{u}_{next}} = \frac{\partial f_{\rho}}{\partial u}{\color{green}\delta u} + f_{\rho}(q,u) \end{aligned}

\begin{aligned} {\color{magenta}\lambda} = \frac{\partial \lambda_{\rho}}{\partial u}{\color{green}\delta u} + \lambda_{\rho}(q,u) \end{aligned}

\begin{aligned} {\color{magenta}\lambda} = \frac{\partial \lambda_{\rho}}{\partial u}{\color{green}\delta u} + \lambda_{\rho}(q,u) \end{aligned}

Motion Set

\begin{aligned} \color{green} \mathcal{T}(q, u) =\{(\delta q, \delta u)\} \end{aligned}

\begin{aligned} \color{green} \mathcal{T}(q, u) =\{(\delta q, \delta u)\} \end{aligned}

Feasible Trust Region

\begin{aligned} \color{blue} \mathcal{M}(q,u)=\{q^\mathrm{u}_{next}\} \end{aligned}

\begin{aligned} \color{blue} \mathcal{M}(q,u)=\{q^\mathrm{u}_{next}\} \end{aligned}

Motion Set

Dynamics (Primal)

Contact Impulses (Dual)

\begin{aligned} {\color{magenta}\lambda}\in \{\mu\lambda_n \geq \|\lambda_t\|_2\} \end{aligned}

\begin{aligned} {\color{magenta}\lambda}\in \{\mu\lambda_n \geq \|\lambda_t\|_2\} \end{aligned}

Trust Region Size

\begin{aligned} \|{\color{green}(\delta q, \delta u)}\|_2\leq \varepsilon \end{aligned}

\begin{aligned} \|{\color{green}(\delta q, \delta u)}\|_2\leq \varepsilon \end{aligned}

\begin{aligned} {\color{blue}q^\mathrm{u}_{next}} = \frac{\partial f_{\rho}}{\partial u}{\color{green}\delta u} + f_{\rho}(q,u) \end{aligned}

\begin{aligned} {\color{blue}q^\mathrm{u}_{next}} = \frac{\partial f_{\rho}}{\partial u}{\color{green}\delta u} + f_{\rho}(q,u) \end{aligned}

\begin{aligned} {\color{magenta}\lambda} = \frac{\partial \lambda_{\rho}}{\partial u}{\color{green}\delta u} + \lambda_{\rho}(q,u) \end{aligned}

\begin{aligned} {\color{magenta}\lambda} = \frac{\partial \lambda_{\rho}}{\partial u}{\color{green}\delta u} + \lambda_{\rho}(q,u) \end{aligned}

Contact Impulse Set

\begin{aligned} \color{green} \mathcal{T}(q, u) =\{(\delta q, \delta u)\} \end{aligned}

\begin{aligned} \color{green} \mathcal{T}(q, u) =\{(\delta q, \delta u)\} \end{aligned}

Feasible Trust Region

\begin{aligned} \color{magenta} \mathcal{C}(q,u)=\{\lambda\} \end{aligned}

\begin{aligned} \color{magenta} \mathcal{C}(q,u)=\{\lambda\} \end{aligned}

Contact Impulse Set

Dynamics (Primal)

Contact Impulses (Dual)

\begin{aligned} {\color{magenta}\lambda}\in \{\mu\lambda_n \geq \|\lambda_t\|_2\} \end{aligned}

\begin{aligned} {\color{magenta}\lambda}\in \{\mu\lambda_n \geq \|\lambda_t\|_2\} \end{aligned}

Trust Region Size

\begin{aligned} \|{\color{green}(\delta q, \delta u)}\|_2\leq \varepsilon \end{aligned}

\begin{aligned} \|{\color{green}(\delta q, \delta u)}\|_2\leq \varepsilon \end{aligned}

\begin{aligned} {\color{blue}q^\mathrm{u}_{next}} = \frac{\partial f_{\rho}}{\partial u}{\color{green}\delta u} + f_{\rho}(q,u) \end{aligned}

\begin{aligned} {\color{blue}q^\mathrm{u}_{next}} = \frac{\partial f_{\rho}}{\partial u}{\color{green}\delta u} + f_{\rho}(q,u) \end{aligned}

\begin{aligned} {\color{magenta}\lambda} = \frac{\partial \lambda_{\rho}}{\partial u}{\color{green}\delta u} + \lambda_{\rho}(q,u) \end{aligned}

\begin{aligned} {\color{magenta}\lambda} = \frac{\partial \lambda_{\rho}}{\partial u}{\color{green}\delta u} + \lambda_{\rho}(q,u) \end{aligned}

Finding Optimal Action: One Step

\begin{aligned} \min_{\delta u} \quad & \|q^\mathrm{u}_{goal} - q^\mathrm{u}_+\|^2 + \|\delta u\|^2 \\ \text{s.t.} \quad & q_+ \in \mathcal{M}_\varepsilon(q,u) \end{aligned}

\begin{aligned} \min_{\delta u} \quad & \|q^\mathrm{u}_{goal} - q^\mathrm{u}_+\|^2 + \|\delta u\|^2 \\ \text{s.t.} \quad & q_+ \in \mathcal{M}_\varepsilon(q,u) \end{aligned}

Get to the goal

Minimize effort

Motion Set Constraint

\begin{aligned} \color{blue} \mathcal{M}(q,u)=\{q^\mathrm{u}_{next}\} \end{aligned}

\begin{aligned} \color{blue} \mathcal{M}(q,u)=\{q^\mathrm{u}_{next}\} \end{aligned}

\begin{aligned} \color{red} q^\mathrm{u}_{goal} \end{aligned}

\begin{aligned} \color{red} q^\mathrm{u}_{goal} \end{aligned}

\begin{aligned} \color{blue} q \end{aligned}

\begin{aligned} \color{blue} q \end{aligned}

\begin{aligned} \delta u \end{aligned}

\begin{aligned} \delta u \end{aligned}

\begin{aligned} \min_{\delta u_{0:T-1},\delta q_{0:T}} \quad & \|q^\mathrm{u}_{goal} - q^\mathrm{u}_T\|^2 + \sum^{T-1}_{t=0}\|u_{t+1} - u_t\|^2 \\ \text{s.t.} \quad & q_{t+1}\in \mathcal{M}_\varepsilon(q_t,u_t) \\ & q_0 = q_{initial} \\ & u_t = \bar{u}_t + \delta u_t \quad\quad\forall t\in\{0,\cdots,T-1\} \end{aligned}

\begin{aligned} \min_{\delta u_{0:T-1},\delta q_{0:T}} \quad & \|q^\mathrm{u}_{goal} - q^\mathrm{u}_T\|^2 + \sum^{T-1}_{t=0}\|u_{t+1} - u_t\|^2 \\ \text{s.t.} \quad & q_{t+1}\in \mathcal{M}_\varepsilon(q_t,u_t) \\ & q_0 = q_{initial} \\ & u_t = \bar{u}_t + \delta u_t \quad\quad\forall t\in\{0,\cdots,T-1\} \end{aligned}

Get to the goal

Minimize effort

Multi-Horizon Optimization

Motion Set Constraint

Trajectory Optimization: Step 1

q_0

q_0

q_1

q_1

q_2

q_2

q_3

q_3

q_4

q_4

q_5

q_5

q_6

q_6

u_0

u_0

u_1

u_1

u_2

u_2

u_3

u_3

u_4

u_4

u_5

u_5

Roll out an input trajectory guess to obtain an initial trajectory.

u_{0:T-1}

u_{0:T-1}

q_\mathrm{goal}

q_\mathrm{goal}

Trajectory Optimization: Step 2

q_0

q_0

q_1

q_1

q_2

q_2

q_3

q_3

q_4

q_4

q_5

q_5

q_6

q_6

u_0

u_0

u_1

u_1

u_2

u_2

u_3

u_3

u_4

u_4

u_5

u_5

Linearize around each point to obtain a local dynamics model around the trajectory

(q_t,u_t)

(q_t,u_t)

q_\mathrm{goal}

q_\mathrm{goal}

Trajectory Optimization: Step 3

q_0

q_0

q_1

q_1

q_2

q_2

q_3

q_3

q_4

q_4

q_5

q_5

q_6

q_6

u_0

u_0

u_1

u_1

u_2

u_2

u_3

u_3

u_4

u_4

u_5

u_5

q_\mathrm{goal}

q_\mathrm{goal}

\delta q_1^\star

\delta q_1^\star

u_0 + \delta u_0^\star

u_0 + \delta u_0^\star

u_1 + \delta u_1^\star

u_1 + \delta u_1^\star

q_1 + \delta q_1^\star

q_1 + \delta q_1^\star

\delta q_2^\star

\delta q_2^\star

\delta q_3^\star

\delta q_3^\star

\delta q_4^\star

\delta q_4^\star

\delta q_5^\star

\delta q_5^\star

u_2 + \delta u_2^\star

u_2 + \delta u_2^\star

u_3 + \delta u_3^\star

u_3 + \delta u_3^\star

u_4 + \delta u_4^\star

u_4 + \delta u_4^\star

u_5 + \delta u_5^\star

u_5 + \delta u_5^\star

q_2 + \delta q_2^\star

q_2 + \delta q_2^\star

q_3 + \delta q_3^\star

q_3 + \delta q_3^\star

q_4 + \delta q_4^\star

q_4 + \delta q_4^\star

q_5 + \delta q_5^\star

q_5 + \delta q_5^\star

q_6 + \delta q_6^\star

q_6 + \delta q_6^\star

\delta q_6^\star

\delta q_6^\star

Use our subproblem to solve for a new input trajectory

u_{0:T-1}^\star = u_{0:T-1} + \delta u^\star_{0:T-1}

u_{0:T-1}^\star = u_{0:T-1} + \delta u^\star_{0:T-1}

Trajectory Optimization: Iteration

q_0

q_0

q_1

q_1

q_2

q_2

q_3

q_3

q_4

q_4

q_5

q_5

q_6

q_6

u_0

u_0

u_1

u_1

u_2

u_2

u_3

u_3

u_4

u_4

u_5

u_5

q_\mathrm{goal}

q_\mathrm{goal}

Roll out the new input trajectory guess and repeat til convergence

u_{0:T-1}^\star

u_{0:T-1}^\star

Model Predictive Control (MPC)

q_0

q_0

q_1

q_1

q_2

q_2

q_3

q_3

q_4

q_4

q_5

q_5

q_6

q_6

u_0

u_0

u_1

u_1

u_2

u_2

u_3

u_3

u_4

u_4

u_5

u_5

Plan a trajectory towards the goal in an open-loop manner

Model Predictive Control (MPC)

q_0

q_0

q_1^\mathrm{planned}

q_1^\mathrm{planned}

u_0

u_0

Replan from the observed state,

execute the first action,

and repeat.

q_1^\mathrm{real}

q_1^\mathrm{real}

q_2

q_2

q_3

q_3

q_4

q_4

q_5

q_5

q_6

q_6

u_1

u_1

u_2

u_2

u_3

u_3

u_4

u_4

u_5

u_5

Force Control

Having a model of contact impulses allows us to perform force control

\begin{aligned} \min_{\delta u_{0:T-1},\delta q_{0:T}} \quad & \|\lambda_{goal} - \lambda\|^2 + \sum^{T-1}_{t=0}\|u_{t+1} - u_t\|^2 \\ \text{s.t.} \quad & \lambda\in \mathcal{C}_\varepsilon(q_t,u_t) \\ & q_0 = q_{initial} \\ & u_t = \bar{u}_t + \delta u_t \quad\quad\forall t\in\{0,\cdots,T-1\} \end{aligned}

\begin{aligned} \min_{\delta u_{0:T-1},\delta q_{0:T}} \quad & \|\lambda_{goal} - \lambda\|^2 + \sum^{T-1}_{t=0}\|u_{t+1} - u_t\|^2 \\ \text{s.t.} \quad & \lambda\in \mathcal{C}_\varepsilon(q_t,u_t) \\ & q_0 = q_{initial} \\ & u_t = \bar{u}_t + \delta u_t \quad\quad\forall t\in\{0,\cdots,T-1\} \end{aligned}

Applied desired forces

Minimize effort

Contact Impulse Set

Utilizing the MPC Cost

Our Local Controller

\begin{aligned} {\color{red}V(q^\mathrm{a}_\mathrm{init}; q^\mathrm{u}_\mathrm{init};q^\mathrm{u}_{goal})\coloneqq} \end{aligned}

\begin{aligned} {\color{red}V(q^\mathrm{a}_\mathrm{init}; q^\mathrm{u}_\mathrm{init};q^\mathrm{u}_{goal})\coloneqq} \end{aligned}

How well does the policy perform when this controller is run closed-loop?

\begin{aligned} \min_{\delta u_{0:T-1},\delta q_{0:T}} \quad & \|{\color{red}q^\mathrm{u}_{goal}} - q^\mathrm{u}_T\|^2 + \sum^{T-1}_{t=0}\|u_{t+1} - u_t\|^2 \\ \text{s.t.} \quad & q_{t+1}\in \mathcal{M}_\varepsilon(q_t,u_t) \\ & q_0^\mathrm{u} = {\color{red}q^\mathrm{u}_{init}} \\ & q_0^\mathrm{a} = {\color{red}q^\mathrm{a}_{init}} \\ & u_t = \bar{u}_t + \delta u_t \quad\quad\forall t\in\{0,\cdots,T-1\} \end{aligned}

\begin{aligned} \min_{\delta u_{0:T-1},\delta q_{0:T}} \quad & \|{\color{red}q^\mathrm{u}_{goal}} - q^\mathrm{u}_T\|^2 + \sum^{T-1}_{t=0}\|u_{t+1} - u_t\|^2 \\ \text{s.t.} \quad & q_{t+1}\in \mathcal{M}_\varepsilon(q_t,u_t) \\ & q_0^\mathrm{u} = {\color{red}q^\mathrm{u}_{init}} \\ & q_0^\mathrm{a} = {\color{red}q^\mathrm{a}_{init}} \\ & u_t = \bar{u}_t + \delta u_t \quad\quad\forall t\in\{0,\cdots,T-1\} \end{aligned}

Get to the goal

Minimize effort

Motion Set Constraint

The Actuator Relocation Problem

\begin{aligned} \min_{q^\mathrm{a}_\mathrm{init}} \quad & V^\pi(q^\mathrm{a}_\mathrm{init}; q^\mathrm{u}_\mathrm{init};q^\mathrm{u}_{goal}) \\ \text{s.t.} \quad & \phi_i(q^\mathrm{a}_\mathrm{init}, q^\mathrm{u}_\mathrm{init})\geq 0 \end{aligned}

\begin{aligned} \min_{q^\mathrm{a}_\mathrm{init}} \quad & V^\pi(q^\mathrm{a}_\mathrm{init}; q^\mathrm{u}_\mathrm{init};q^\mathrm{u}_{goal}) \\ \text{s.t.} \quad & \phi_i(q^\mathrm{a}_\mathrm{init}, q^\mathrm{u}_\mathrm{init})\geq 0 \end{aligned}

Choose best initial actuator configuration for MPC

Non-penetration constraints

Where should I place my finger if I want to make progress towards the goal?

Global Search for Initial Configurations

Regularizing with Robustness

\begin{aligned} \min_{q^\mathrm{a}_\mathrm{init}} \quad & V^\pi(q^\mathrm{a}_\mathrm{init}; q^\mathrm{u}_\mathrm{init};q^\mathrm{u}_{goal}) + {\color{red} \alpha r(q^\mathrm{a}_{init};q^\mathrm{u}_\mathrm{init})^2} \\ \text{s.t.} \quad & \phi_i(q^\mathrm{a}_\mathrm{init}, q^\mathrm{u}_\mathrm{init})\geq 0\\ & q^\mathrm{a}_\mathrm{lb} \leq q^\mathrm{a}_\mathrm{init} \leq q^\mathrm{a}_\mathrm{ub} \end{aligned}

\begin{aligned} \min_{q^\mathrm{a}_\mathrm{init}} \quad & V^\pi(q^\mathrm{a}_\mathrm{init}; q^\mathrm{u}_\mathrm{init};q^\mathrm{u}_{goal}) + {\color{red} \alpha r(q^\mathrm{a}_{init};q^\mathrm{u}_\mathrm{init})^2} \\ \text{s.t.} \quad & \phi_i(q^\mathrm{a}_\mathrm{init}, q^\mathrm{u}_\mathrm{init})\geq 0\\ & q^\mathrm{a}_\mathrm{lb} \leq q^\mathrm{a}_\mathrm{init} \leq q^\mathrm{a}_\mathrm{ub} \end{aligned}

How do we efficiently find an answer to this problem?