Prof. Sarah Dean
MW 2:45-4pm
255 Olin Hall
1. Recap: Control & LQR
2. Optimal LQR Policy
3. Nonlinear Approximation
4. Local Linear Control
\(\mathcal M = \{\mathcal{S}, \mathcal{A}, c, f, H\}\)
minimize \(\displaystyle\sum_{t=0}^{H-1} c_t(s_t, a_t)+c_H(s_H)\)
s.t. \(s_{t+1}=f(s_t, a_t), ~~a_t=\pi_t(s_t)\)
\(\pi\)
General purpose dynamic programming algorithm in the context of optimal control is:
Special case of optimal control problem with
minimize \(\displaystyle\sum_{t=0}^{H-1} s_t^\top Qs_t +a_t^\top Ra_t+s_H^\top Q s_H\)
s.t. \(s_{t+1}=As_t+B a_t, ~~a_t=\pi_t(s_t)\)
\(\pi\)
Important background:
Linear algebra and probability background*
*these references are not necessarily an exact match to the course and they are not required
$$\min_{a_0, a_1}\quad s_0^\top \begin{bmatrix}1&0\\ 0&0\end{bmatrix}s_0 + s_1^\top \begin{bmatrix}1&0\\ 0&0\end{bmatrix}s_1 + s_2^\top \begin{bmatrix}1&0\\ 0&0\end{bmatrix}s_2+\lambda a_{0}^2+\lambda a_1^2 $$
$$\text{s.t.} \quad s_{1} = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s_{0} + \begin{bmatrix}0\\ 1\end{bmatrix}a_{0} , \quad \quad s_{2} = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s_{1} + \begin{bmatrix}0\\ 1\end{bmatrix}a_{1} $$
\(a_t\)
\(a_1^\star=0\)
$$\min_{a_1}\quad (\begin{bmatrix}1&0\end{bmatrix}s_1)^2 + (\begin{bmatrix}1&1\end{bmatrix}s_1)^2 + \lambda a_1^2$$
$$\min_{a_0}\quad s_0^\top \begin{bmatrix}1&0\\ 0&0\end{bmatrix}s_0 + (\begin{bmatrix}1&0\end{bmatrix}s_1)^2 + (\begin{bmatrix}1&1\end{bmatrix}s_1)^2 +\lambda a_{0}^2 $$
$$\text{s.t.} \quad s_{1} = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s_{0} + \begin{bmatrix}0\\ 1\end{bmatrix}a_{0} , \quad \qquad\qquad\qquad\qquad$$
\(a_t\)
\(a_1^\star=0\)
\(=-\frac{1}{1+\lambda}(\mathsf{pos}_0-x+2\mathsf{vel}_0)\)
\( a_0^\star = -\frac{\begin{bmatrix}1&2\end{bmatrix}s_0}{1+\lambda} \)
$$\min_{a_0}\quad s_0^\top \begin{bmatrix}3&3\\ 3&5\end{bmatrix}s_0 + 2 s_0^\top \begin{bmatrix}1\\2\end{bmatrix}a_0 + (1 +\lambda)a_0^2 $$
1. Recap: Control & LQR
2. Optimal LQR Policy
3. Nonlinear Approximation
4. Local Linear Control
DP: \(V_t^\star (s) = \min_{a} c(s, a)+V_{t+1}^\star (f(s,a))\)
PollEV
Important background:
Theorem: For \(t=0,\dots ,H-1\), the optimal value function is quadratic and the optimal policy is linear$$V^\star_t (s) = s^\top P_t s \quad\text{ and }\quad \pi_t^\star(s) = K_t s$$
where the matrices are defined as \(P_{H} = Q\) and
Theorem: \(V^\star_t (s) = s^\top P_t s\) and \(\pi_t^\star(s) = K_t s\) where \(P_{H} = Q\),
\(P_t = Q+A^\top P_{t+1}A - A^\top P_{t+1}B(R+B^\top P_{t+1}B)^{-1}B^\top P_{t+1}A\)
\(K_t = -(R+B^\top P_{t+1}B)^{-1}B^\top P_{t+1}A\)
\(a_t\)
\(\pi_t^\star(s) = \begin{bmatrix}{ \gamma^\mathsf{pos}_t }& {\gamma_t^\mathsf{vel}} \end{bmatrix}s = \gamma^\mathsf{pos}_t (\mathsf{pos} - x) + \gamma^\mathsf{vel}_t \mathsf{vel} \)
\(\gamma^\mathsf{pos}\)
\(\gamma^\mathsf{vel}\)
\(-1\)
\(t\)
\(H\)
1. Recap: Control & LQR
2. Optimal LQR Policy
3. Nonlinear Approximation
4. Local Linear Control
\(a_t\)
\(a_t\)
\( \frac{\partial f_j (s,a)}{\partial s_i}\)
\(i\)
\(j\)
\( \frac{\partial f_j (s,a)}{\partial a_i}\)
\(i\)
\(j\)
\(a_t\)
$$\nabla_s f(s,a) = \begin{bmatrix} \frac{\partial f_1 (s,a)}{\partial \mathsf{pos}} & \frac{\partial f_2 (s,a)}{\partial \mathsf{pos}} \\ \frac{\partial f_1 (s,a)}{\partial \mathsf{vel}} & \frac{\partial f_2 (s,a)}{\partial \mathsf{vel}} \end{bmatrix} $$
\(=\begin{bmatrix} f_1(s,a)\\f_2(s,a)\end{bmatrix}\)
\( \frac{\partial c (s,a)}{\partial s_i}\)
\( \frac{\partial c (s,a)}{\partial a_i}\)
\(i\)
\(i\)
\( \frac{\partial^2 c (s,a)}{\partial s_i\partial s_j}\)
\( \frac{\partial^2c(s,a)}{\partial a_i\partial a_j}\)
\( \frac{\partial^2 c (s,a)}{\partial a_i \partial s_j}\)
\(i\)
\(i\)
\(i\)
\(j\)
\(j\)
\(j\)
symmetric
\(a_t\)
$$\frac{\partial c (s,a)}{\partial a_i \partial s_j} \approx \frac{1}{2\delta}\Big[ \frac{c(s+\delta e_j,a +\delta e_i)- c(s-\delta e_j,a +\delta e_i)}{2\delta} \\- \frac{c(s+\delta e_j,a -\delta e_i)-c(s-\delta e_j,a -\delta e_i)}{2\delta} \Big]$$
$$\frac{\partial c (s,a)}{\partial a_i \partial s_j} \approx \frac{1}{2\delta}\Big[ \frac{\partial c (s,a +\delta e_i)}{\partial s_j} - \frac{\partial c (s,a -\delta e_i)}{\partial s_j} \Big]$$
1. Recap: Control & LQR
2. Optimal LQR Policy
3. Nonlinear Approximation
4. Local Linear Control
angle \(\theta\)
angular velocity \(\omega\)
gravity
position \(x\)
force \(f\)
velocity \(v\)
minimize \(\displaystyle\sum_{t=0}^{H-1} c(s_t, a_t)\)
s.t. \(s_{t+1}=f(s_t, a_t), ~~a_t=\pi_t(s_t)\)
\(\pi\)
minimize \(\displaystyle\sum_{t=0}^{H-1} c(s_t, a_t)\)
s.t. \(s_{t+1}=f(s_t, a_t), ~~a_t=\pi_t(s_t)\)
\(\pi\)
\(a_t\)
For a symmetric matrix \(Q\in\mathbb R^{n\times n}\) the eigen-decomposition is $$Q = \sum_{i=1}^n v_iv_i^\top \sigma_i $$
To make this PSD, we replace $$Q\leftarrow \sum_{i=1}^n v_iv_i^\top (\max\{0,\sigma_i\} +\lambda)$$
\(a_t\)
minimize \(\displaystyle\sum_{t=0}^{H-1} c(s_t, a_t)\)
s.t. \(s_{t+1}=f(s_t, a_t), ~~a_t=\pi_t(s_t)\)
\(\pi\)