Prof. Sarah Dean
MW 2:45-4pm
255 Olin Hall
1. Recap: Local LQR
2. Iterative LQR
3. PID Control
4. Limitations to Control
Theorem: For \(t=0,\dots ,H-1\), the optimal value function is quadratic and the optimal policy is linear$$V^\star_t (s) = s^\top P_t s \quad\text{ and }\quad \pi_t^\star(s) = K_t s$$
where the matrices are defined as \(P_{H} = Q\) and
minimize \(\displaystyle\sum_{t=0}^{H-1} c(s_t, a_t)\)
s.t. \(s_{t+1}=f(s_t, a_t), ~~a_t=\pi_t(s_t)\)
\(\pi\)
For a symmetric matrix \(Q\in\mathbb R^{n\times n}\) the eigen-decomposition is $$Q = \sum_{i=1}^n v_iv_i^\top \sigma_i $$
To make this PSD, we replace $$Q\leftarrow \sum_{i=1}^n v_iv_i^\top (\max\{0,\sigma_i\} +\lambda)$$
\(a_t\)
\(\pi_t^\star(s) = \begin{bmatrix}{ \gamma^\mathsf{pos}_t }& {\gamma_t^\mathsf{vel}} \end{bmatrix}s = \gamma^\mathsf{pos}_t (\mathsf{pos} - x) + \gamma^\mathsf{vel}_t \mathsf{vel} \)
\(\gamma^\mathsf{pos}\)
\(\gamma^\mathsf{vel}\)
\(-1\)
\(t\)
\(H\)
\(a_t\)
\(a_t\)
1. Recap: Local LQR
2. Iterative LQR
3. PID Control
4. Limitations to Control
minimize \(\displaystyle\sum_{t=0}^{H-1} c(s_t, a_t)\)
s.t. \(s_{t+1}=f(s_t, a_t), ~~a_t=\pi_t(s_t)\)
\(\pi\)
\(s_0\sim\mu_0\)
Linearize around a trajectory. What trajectory? Iterate!
Black lines: \(\tau_{i-1}\), red arrows: trajectory if linearization was true, blue dashed lines: \(\tau_i\)
1. Recap: Local LQR
2. Iterative LQR
3. PID Control
4. Limitations to Control
\(t\)
error
1. Recap: Local LQR
2. Iterative LQR
3. PID Control
4. Limitations to Control
\(0\)
\(1\)
\(a\in\){stay,switch}
\(a=\)stay
\(a=\)switch
PollEV
Definition:
Theorem: Given finite \(\mathcal S,\mathcal A\) and transition function \(P\), construct a directed graph with vertices \(\mathcal V=\mathcal S\) and an edge from \(s\) to \(s'\) if \(P(s'|s,a)>0\) for some \(a\in\mathcal A\).
\(0\)
\(1\)
Proof:
\(0\)
\(1\)
Theorem: The linear dynamics \(s_{t+1}=As_t+Ba_t\) are controllable if the controllability grammian \(\mathcal C\) is full rank. $$\mathrm{rank}\Big(\underbrace{\begin{bmatrix}B & AB & A^2 B & \dots & A^{n_s-1}B\end{bmatrix}}_{\mathcal C}\Big) = n_s $$
For the example \(s_{t+1} = \begin{bmatrix} 2 & 0 \\ 0 & 1\end{bmatrix} s_t + \begin{bmatrix}0\\1\end{bmatrix}a_t\)
Theorem: The linear dynamics \(s_{t+1}=As_t+Ba_t\) are controllable if the controllability grammian \(\mathcal C\) is full rank. $$\mathrm{rank}\Big(\underbrace{\begin{bmatrix}B & AB & A^2 B & \dots & A^{n_s-1}B\end{bmatrix}}_{\mathcal C}\Big) = n_s $$
Proof:
\(a_t\)
To get from \(s\) to \(s'\) we can simply take the actions: