Sarah Dean PRO
asst prof in CS at Cornell
Prof. Sarah Dean
MW 2:55-4:10pm
255 Olin Hall
1. Continuous Control
2. UAV Example
3. Linear Quadratic Regulator
\(\mathcal M = \{\mathcal{S}, \mathcal{A}, c, f, H\}\)
\(\mathcal M = \{\mathcal{S}, \mathcal{A}, c, f, H\}\)
minimize \(\displaystyle\sum_{t=0}^{H-1} c_t(s_t, a_t)+c_H(s_H)\)
s.t. \(s_{t+1}=f(s_t, a_t), ~~a_t=\pi_t(s_t)\)
\(\pi\)
\(\mathcal M = \{\mathcal{S}, \mathcal{A}, c,( f,\mathcal D_w), [H,\gamma,\mathsf{avg}]\}\)
1. Continuous Control
2. UAV Example
3. Linear Quadratic Regulator
\(a_t\)
\(a_t\)
\(\mathcal M = \{\mathcal{S}, \mathcal{A}, c, f, H\}\)
Q: How would you pick actions?
\(a_t\)
\(\varepsilon\)
1. Continuous Control
2. UAV Example
3. Linear Quadratic Regulator
Important background on matrices:
\(\underbrace{\qquad}_{A}\)
\(\underbrace{\qquad}_{B}\)
Special case of optimal control problem with
minimize \(\displaystyle\sum_{t=0}^{H-1} s_t^\top Qs_t +a_t^\top Ra_t+s_H^\top Q s_H\)
s.t. \(s_{t+1}=As_t+B a_t, ~~a_t=\pi_t(s_t)\)
\(\pi\)
Reformulating for optimal control (max vs min), our general purpose dynamic programming algorithm is:
\(V^\star_{t+1}(f(s,a))\)
DP: \(V_t^\star (s) = \min_{a} c(s, a)+V_{t+1}^\star (f(s,a))\)
PollEV
Theorem: For \(t=0,\dots ,H-1\), the optimal value function is quadratic and the optimal policy is linear$$V^\star_t (s) = s^\top P_t s \quad\text{ and }\quad \pi_t^\star(s) = K_t s$$
where the matrices are defined as \(P_{H} = Q\) and
Theorem: For \(t=0,\dots ,H-1\), the optimal value function is quadratic and the optimal policy is linear$$V^\star_t (s) = s^\top P_t s \quad\text{ and }\quad \pi_t^\star(s) = K_t s$$
where the matrices are defined as \(P_{H} = Q\) and
Theorem: For \(t=0,\dots ,H-1\), the optimal value function is quadratic and the optimal policy is linear$$V^\star_t (s) = s^\top P_t s \quad\text{ and }\quad \pi_t^\star(s) = K_t s$$
where the matrices are defined as \(P_{H} = Q\) and
\(\pi^\star = (K_0,\dots,K_{H-1}) = \mathsf{LQR}(A,B,Q,R)\)
\(a_t\)
\(\pi_t^\star(s) = K^\star_t s= \begin{bmatrix}{ \gamma^\mathsf{pos}_t }& {\gamma_t^\mathsf{vel}} \end{bmatrix}s\)
\(\gamma^\mathsf{pos}\)
\(\gamma^\mathsf{vel}\)
\(-1\)
\(t\)
\(H\)
By Sarah Dean