H.J. Terry Suh
MIT
Motivation: Why should we care?
Agile & Autonomous Locomotion
Dexterous Manipulation
Whole-Body Loco-Manipulation
"The art of robots making contact where they are not supposed to make contact"
Motivation: What makes this problem difficult?
The Non-Smooth Nature of Contact makes tools from smooth optimization difficult to use.
Making & Breaking Contact
Non-smoothness of Friction
Non-smoothness of Geometry
Motivation: Hybrid Dynamics
Hybrid Dynamics
Motivation: The Fallacy of Hybrid Dynamics
Contact is non-smooth. But Is it truly "discrete"?
The core thesis of this talk:
Counting contact modes for these kinds of systems results in trillions of modes.
Are we truly thinking of whether we're making contact or not for all possible fingers?
Smoothing Techniques for Non-Smooth Problems
Some non-smooth problems are successfully tackled by smooth approximations without sacrificing much from bias.
Is contact one of these problems?
*Figures taken from Yuxin Chen's slides on "Smoothing for Non-smooth Optimization"
Smoothing in Optimization
We can formally define smoothing as a process of convolution with a smooth kernel,
In addition, for purposes of optimization, we are interested in methods that provide easy access to the derivative of the smooth surrogate.
Original Function
Smooth Surrogate
Derivative of the Smooth Surrogate:
These provide linearization Jacobians in the setting when f is dynamics, and policy gradients in the setting when f is a value function.
Taxonomy of Smoothing
Case 1. Analytic Smoothing
Taxonomy of Smoothing
Case 2. Randomized Smoothing, First Order
Taxonomy of Smoothing
Case 2. Randomized Smoothing, First Order
*Figures taken from John Duchi's slides on Randomized Smoothing
Taxonomy of Smoothing
Case 2. Randomized Smoothing, Zeroth-Order
This seems like it came out of nowhere? How can this be true?
Taxonomy of Smoothing
Rethinking Linearization as a Minimizer.
Also provides a convenient way to compute the gradient in zeroth-order. Just sample and run least-squares!
Tradeoffs between structure and performance.
The generally accepted wisdom: more structure gives more performance.
Analytic smoothing
Randomized Smoothing
First-Order
Randomized Smoothing
Zeroth-Order
Structure Requirements
Performance / Efficiency
Smoothing of Contact Dynamics
Without going too much into details of multibody contact dynamics, we will use time-stepping, quasidynamic formulation of contact.
Equations of Motion (KKT Conditions)
Non-penetration
(Primal feasibility)
Complementary slackness
Dual feasibility
Force Balance
(Stationarity)
Quasistatic QP Dynamics
We can randomize smooth this with first order methods using sensitivity analysis or use zeroth-order randomized smoothing.
But can we smooth this analytically?
Barrier (Interior-Point) Smoothing
Quasistatic QP Dynamics
Equations of Motion (KKT Conditions)
Interior-Point Relaxation of the QP
Equations of Motion (Stationarity)
Impulse
Relaxation of complementarity
"Force from a distance"
What does smoothing do to contact dynamics?
Is barrier smoothing a form of convolution?
Equivalence of Randomized and Barrier Smoothing.
Later result shows that there always exists such a kernel for Linear Complementary Systems (LCS).
Optimal Control with Dynamics Smoothing
Replace linearization in iLQR with smoothed linearization
Exact
Smoothed
Smoothing of Value Functions.
Optimal Control thorugh Non-smooth Dynamics
Policy Optimization
Cumulative Cost
Dynamics
Policy (can be open-loop)
Dynamics Smoothing
Value Smoothing
Recall that smoothing turns into .
Why not just smooth the value function directly and run policy optimization?
Smoothing of Value Functions.
Original Problem
Long-horizon problems involving contact can have terrible landscapes.
Smoothing of Value Functions.
Smooth Surrogate
The benefits of smoothing are much more pronounced in the value smoothing case.
Beautiful story - noise sometimes regularizes the problem, developing into helpful bias.
How do we take gradients of smoothed value function?
Analytic smoothing
Randomized Smoothing
First-Order
Randomized Smoothing
Zeroth-Order
Structure Requirements
Performance / Efficiency
Pretty much not possible.
How do we take gradients of smoothed value function?
First-Order Policy Search with Differentiable Simulation
Policy Gradient Methods in RL (REINFORCE / TRPO / PPO)
Structure Requirements
Performance / Efficiency
Turns out there is an important question hidden here regarding the utility of differentiable simulators.
Do Differentiable Simulators Give Better Policy Gradients?
Very important question for RL, as it promises lower variance, faster convergence rates, and more sample efficiency.
What do we mean by "better"?
Consider a simple stochastic optimization problem
First-Order Gradient Estimator
Zeroth-Order Gradient Estimator
Then, we can define two different gradient estimators.
What do we mean by "better"?
First-Order Gradient Estimator
Zeroth-Order Gradient Estimator
Bias
Variance
Common lesson from stochastic optimization:
1. Both are unbiased under sufficient regularity conditions
2. First-order generally has less variance than zeroth order.
What happens in Contact-Rich Scenarios?
Bias
Variance
Common lesson from stochastic optimization:
1. Both are unbiased under sufficient regularity conditions
2. First-order generally has less variance than zeroth order.
Bias
Variance
Bias
Variance
We show two cases where the commonly accepted wisdom is not true.
1st Pathology: First-Order Estimators CAN be biased.
2nd Pathology: First-Order Estimators can have MORE
variance than zeroth-order.
Bias from Discontinuities
1st Pathology: First-Order Estimators CAN be biased.
What's worse: the empirical variance is also zero!
(The estimator is absolutely sure about an estimate that is wrong)
Not just a pathology, could happen quite often in contact.
Empirical Bias leads to High Variance
Perhaps it's a modeling artifact? Contact can be softened.
Variance of First-Order Estimators
2nd Pathology: First-order Estimators CAN have more variance than zeroth-order ones.
Scales with Gradient
Scales with Function Value
Scales with dimension of decision variables.
High-Variance Events
Case 1. Persistent Stiffness
Case 2. Chaotic
Motivating Gradient Interpolation
Bias
Variance
Common lesson from stochastic optimization:
1. Both are unbiased under sufficient regularity conditions
2. First-order generally has less variance than zeroth order.
Bias
Variance
Bias
Variance
1st Pathology: First-Order Estimators CAN be biased.
2nd Pathology: First-Order Estimators can have MORE
variance than zeroth-order.
Can we automatically decide which of these categories we fall into based on statistical data?
The Alpha-Ordered Gradient Estimator
Perhaps we can do some interpolation of the two gradients based on some criteria.
Previous works attempt to minimize the variance of the interpolated estimator using empirical variance.
Robust Interpolation
Thus, we propose a robust interpolation criteria that also restricts the bias of the interpolated estimator.
Robust Interpolation
Robust Interpolation
Implementation
Confidence Interval on the zeroth-order gradient.
Difference between the gradients.
Key idea: Unit-test the first-order estimate against the unbiased zeroth-order estimate to guarantee correctness probabilistically. .
Results: Ball throwing on Wall
Key idea: Do not commit to zeroth or first uniformly,
but decide coordinate-wise which one to trust more.
Results: Policy Optimization
Able to capitalize on better convergence of first-order methods while being robust to their pitfalls.
Limitations of Smoothing
Contact is non-smooth. But Is it truly "discrete"?
The core thesis of this talk:
The local decisions of where to make contact are better modeled as continuous decisions with some smooth approximations.
My viewpoint so far:
Limitations of Smoothing
Contact is non-smooth. But Is it truly "discrete"?
The core thesis of this talk:
The local decisions of where to make contact are better modeled as continuous decisions with some smooth approximations.
My viewpoint so far:
The remaining "discrete decisions" come not from contact, but from discrete-level decisions during planning.
Smoothing CANNOT handle these high-level discrete decisions.
Limitations of Smoothing
These reveal true discrete "modes" of the decision making process.
Limitations of Smoothing
Apply negative impulse
to stand up.
Apply positive impulse to bounce on the wall.
Limitations of Smoothing
Can we smooth local contact decisions and efficiently search through high-level discrete decisions?
Our ideal solution
Global Search with Smoothing: Contact-Rich RRT
Motivating Contact-Rich RRT
Sampling-Based Motion Planning is a popular solution in robotics for complex non-convex motion planning
How do we define notions of nearest?
How do we extend (steer)?
Reachability-Consistent Distance Metric
Reachability-based Mahalanobis Distance
How do we come up with a distance metric between q and qbar in a dynamically consistent manner?
Reachability-Consistent Distance Metric
Reachability-based Mahalanobis Distance
How do we come up with a distance metric between q and qbar in a dynamically consistent manner?
Consider a one-step input linearization of the system.
Then we could consider a "reachability ellipsoid" under this linearized dynamics,
Note: For quasidynamic formulations, ubar is a position command, which we set as the actuated part of qbar.
Reachability Ellipsoid
Reachability Ellipsoid
Intuitively, if B lengthens the direction towards q from a uniform ball, q is easier to reach.
On the other hand, if B decreases the direction towards q, q is hard to reach.
Mahalanobis Distance of an Ellipsoid
Reachability Ellipsoid
The B matrix induces a natural quadratic form for an ellipsoid,
Mahalanobis Distance using 1-Step Reachability
Note: if BBT is not invertible, we need to regularize to property define a quadratic distance metric numerically.
Smoothed Distance Metric
For Contact:
Don't use the exact linearization, but the smooth linearization.
Global Search with Smoothing
Dynamically consistent extension
Theoretically, it is possible to use long-horizon trajopt algorithms such as iLQR / DDP.
Here we simply do one-step trajopt and solve least-squares.
Importantly, the actuation matrix for least-squares is smoothed, but we rollout the actual dynamics with the found action.
Dynamically consistent extension
Contact Sampling
With some probability, we execute a regrasp (sample another valid contact configuration) in order to encourage further exploration.
Global Search with Smoothing
Before I go...
Will be hosting an IROS workshop on
Leveraging Models for Contact-Rich Manipulation.
https://sites.google.com/view/iros2023-contactrich/home
Excited to talk to more model-based folks who are interested in manipulation! (Or conversely, manipulation folks who still have an ounce of hope for models).
Thank you!