Updates with Russ
Terry, Pang
Strategy
Tackle known smooth / contact dynamics.
- Presentation of randomized smoothing idea for dynamical systems
- Presentation of iLQR with randomized smoothing
- Applicability to known dynamical system that can be difficult in several directions:
- high dimensionality
- contact dynamics
- Demonstrate competitive results with exact gradients only with zero-order info, even for high-dimensional systems.
- Demonstrate quasi-dynamic time stepping used with the method to produce impressive results for contact dynamics.
ICRA / RA-L (September)
L4DC (November)
Sell it in the case of Neural Network dynamics
- Presentation of randomized smoothing idea for dynamical systems
- Presentation of iLQR with randomized smoothing
- Turn the trajopt method into a real-time MPC controller (should be doable with pytorch's huge parallelization!).
- Rigorous comparison with CEM / MPPI.
- Won't count as dual submission?
- Output-feedback case: iLQG vs. DRC (SLS)
- Online (RL) case: GPS, Sarah Dean's work
- Deformable objects: Quasi-dynamic simulation for bubbles, shoelaces, dough
- Zero-order Parameter Estimation
- Connect it back to vision.
L4DC (November)
Further directions (maybe next year)
Naming Advice
Went through several variants, but would be good to decide on a name.
Candidates (in increasing order):
- Iterative Randomized Smoothing LQR (IRS-LQR)
- Capture the most important aspect of the change we're making
- Also captures the iterative nature of the smoothing we're applying (variance stepping)
- Randomized has many vowels :) "IRaS-LQR", "IReS-LQR", "IRiS-LQR", "IRoS-LQR"
- Iterative Least Squares LQR (ILS-LQR)
- Doesn't quite capture the fact that we can also sample gradients in first-order instead of doing least-squares in zero-order.
- Should we use "Direct"? (No)
- Distracting because the main point of the work is the smoothing process.
- Using iLQR with Riccati, augmented Lagrangian, etc. also falls under our umbrella.
- But if MPC is the generalization of LQR with constraints, why wouldn't we use it for iLQR with constraints? :)
Update 1: Addressing high dimensionality
Trajectory optimization for Drake's quadrotor (12 states, 4 inputs):
Uses NO gradients (completely zero-order), only 1000 samples required for each knot point.
Takes less than 10 seconds (200 timesteps, but could be much faster if we can sample in batch)
Why is it the case that only 1000 samples can work so well for a 16 dimensional system?
Short answer: we're simultaneously sampling in all directions.
Long answer: Bellman's curse of dimensionality is not the same as Bakhvalov's (MC integration). But what is it about the dynamics that kills the curse? Can we also use Quasi Monte Carlo / Sparse Grids?
Update 2: Trajopt through Quasi-dynamic Contacts
Initial Guess
Exact Gradients
(QP's KKT gradients)
Randomized Smoothing (Gradient Sampling)
Russ_Update_07_23
By Terry Suh
Russ_Update_07_23
- 108