Prof. Sarah Dean
MW 2:55-4:10pm
255 Olin Hall
1. Recap: LQR
2. Local LQR
3. Iterative LQR
4. Differential DP
\(\mathcal M = \{\mathcal{S}, \mathcal{A}, c, f, H\}\)
minimize \(\displaystyle\sum_{t=0}^{H-1} c_t(s_t, a_t)+c_H(s_H)\)
s.t. \(s_{t+1}=f(s_t, a_t), ~~a_t=\pi_t(s_t)\)
\(\pi\)
DP Algorithm: \(V^\star_{H}(s)=c_H(s)\)$$V_{t}^\star(s) =\min_a c(s,a)+V^\star_{t+1}(f(s,a))$$
Theorem: For \(t=0,\dots ,H-1\), the optimal value function is quadratic and the optimal policy is linear
Special case of linear dynamics & quadratic costs $$f(s,a) = As+Ba,\quad c(s,a) = s^\top Q s + a^\top R a$$
\(\pi^\star = (K_0,\dots,K_{H-1}) = \mathsf{LQR}(A,B,Q,R)\)
1. Recap: LQR
2. Local LQR
3. Iterative LQR
4. Differential DP
angle \(\theta\)
angular velocity \(\omega\)
gravity
position \(x\)
force \(f\)
velocity \(v\)
\(a_t\)
\(a_t\)
1. Recap: LQR
2. Local LQR
3. Iterative LQR
4. Differential DP
Approximate around a trajectory. What trajectory? Iterate!
Black lines: \(\tau_{i-1}\), red arrows: trajectory if linearization was true, blue dashed lines: \(\tau_i\)
1. Recap: LQR
2. Local LQR
3. Iterative LQR
4. Differential DP