Updates with Russ

Terry, Pang

Strategy

Tackle known smooth / contact dynamics.

Presentation of randomized smoothing idea for dynamical systems
Presentation of iLQR with randomized smoothing
Applicability to known dynamical system that can be difficult in several directions:
- high dimensionality
- contact dynamics
Demonstrate competitive results with exact gradients only with zero-order info, even for high-dimensional systems.
Demonstrate quasi-dynamic time stepping used with the method to produce impressive results for contact dynamics.

ICRA / RA-L (September)

L4DC (November)

Sell it in the case of Neural Network dynamics

Presentation of randomized smoothing idea for dynamical systems
Presentation of iLQR with randomized smoothing
Turn the trajopt method into a real-time MPC controller (should be doable with pytorch's huge parallelization!).
Rigorous comparison with CEM / MPPI.
Won't count as dual submission?

Output-feedback case: iLQG vs. DRC (SLS)
Online (RL) case: GPS, Sarah Dean's work
Deformable objects: Quasi-dynamic simulation for bubbles, shoelaces, dough
Zero-order Parameter Estimation
Connect it back to vision.

L4DC (November)

Further directions (maybe next year)

Naming Advice

Went through several variants, but would be good to decide on a name.

Candidates (in increasing order):

Iterative Randomized Smoothing LQR (IRS-LQR)
- Capture the most important aspect of the change we're making
- Also captures the iterative nature of the smoothing we're applying (variance stepping)
- Randomized has many vowels :) "IRaS-LQR", "IReS-LQR", "IRiS-LQR", "IRoS-LQR"
Iterative Least Squares LQR (ILS-LQR)
- Doesn't quite capture the fact that we can also sample gradients in first-order instead of doing least-squares in zero-order.
Should we use "Direct"? (No)
- Distracting because the main point of the work is the smoothing process.
- Using iLQR with Riccati, augmented Lagrangian, etc. also falls under our umbrella.
- But if MPC is the generalization of LQR with constraints, why wouldn't we use it for iLQR with constraints? :)

Update 1: Addressing high dimensionality

Trajectory optimization for Drake's quadrotor (12 states, 4 inputs):

Uses NO gradients (completely zero-order), only 1000 samples required for each knot point.

Takes less than 10 seconds (200 timesteps, but could be much faster if we can sample in batch)

Why is it the case that only 1000 samples can work so well for a 16 dimensional system?

Short answer: we're simultaneously sampling in all directions.

Long answer: Bellman's curse of dimensionality is not the same as Bakhvalov's (MC integration). But what is it about the dynamics that kills the curse? Can we also use Quasi Monte Carlo / Sparse Grids?

Update 2: Trajopt through Quasi-dynamic Contacts

Initial Guess

Exact Gradients

(QP's KKT gradients)

Randomized Smoothing (Gradient Sampling)