Sarah Dean PRO
asst prof in CS at Cornell
Prof. Sarah Dean
MW 2:45-4pm
255 Olin Hall
1. Recap
2. Linear Control
3. Linear Quadratic Regulator
\(\mathcal M = \{\mathcal{S}, \mathcal{A}, c, f, H\}\)
minimize \(\displaystyle\sum_{t=0}^{H-1} c_t(s_t, a_t)+c_H(s_H)\)
s.t. \(s_{t+1}=f(s_t, a_t), ~~a_t=\pi_t(s_t)\)
\(\pi\)
\(0<\lambda_2<\lambda_1<1\)
\(0<\lambda_2<1<\lambda_1\)
\(1<\lambda_2<\lambda_1\)
\(\mathbb C\)
\(\mathcal R(\lambda)\)
\(\mathcal I(\lambda)\)
Trajectory is determined by the eigenstructure of \(A\)
\(s_1\)
\(s_2\)
\(\mathbb C\)
\(\mathcal R(\lambda)\)
\(\mathcal I(\lambda)\)
Trajectory is determined by the eigenstructure of \(A\)
\(s_1\)
\(s_2\)
\(\lambda = \alpha \pm i \beta\)
\(\mathbb C\)
\(\mathcal R(\lambda)\)
\(\mathcal I(\lambda)\)
Trajectory is determined by the eigenstructure of \(A\)
\(s_1\)
\(s_2\)
\(\lambda = \alpha \pm i \beta\)
\(0<\alpha^2+\beta^2<1\)
\(1<\alpha^2+\beta^2\)
\(\mathbb C\)
\(\mathcal R(\lambda)\)
\(\mathcal I(\lambda)\)
Trajectory is determined by the eigenstructure of \(A\)
\(s_1\)
\(s_2\)
\(\lambda_1 = \lambda_2=\lambda\)
\(\mathbb C\)
\(\mathcal R(\lambda)\)
\(\mathcal I(\lambda)\)
Trajectory is determined by the eigenstructure of \(A\)
\(s_1\)
\(s_2\)
\(0<\lambda<1\)
\(\lambda>1\)
\(\lambda_1 = \lambda_2=\lambda\)
Theorem: Let \(\{\lambda_i\}_{i=1}^n\subset \mathbb C\) be the eigenvalues of \(A\).
Then for \(s_{t+1}=As_t\), the equilibrium \(s_{eq}=0\) is
\(\mathbb C\)
Proof
If \(A\) is diagonalizable, then any \(s_0\) can be written as a linear combination of eigenvectors \(s_0 = \sum_{i=1}^{n_s} \alpha_i v_i\)
By definition, \(Av_i = \lambda_i v_i\)
Therefore, \(s_t = \sum_{i=1}^{n_s}\alpha_i \lambda_i^t v_i\)
Thus \(s_t\to 0\) if and only if all \(|\lambda_i|<1\), and if any \(|\lambda_i|>1\), \(\|s_t\|\to\infty\)
Proof in the non-diagonalizable case is out of scope, but it follows using the Jordan Normal Form
We call \(\max_i|\lambda_i|=1\) "marginally (un)stable"
1. Recap
2. Linear Control
3. Linear Quadratic Regulator
Full dynamics depend on actions $$ s_{t+1} = As_t+Ba_t $$
\(a_t\)
Linear policy defined by \(a_t=Ks_t\): $$ s_{t+1} = As_t+BKs_t = (A+BK)s_t$$
\(a_t\)
PollEV
\(a_t\)
1. Recap
2. Linear Control
3. Linear Quadratic Regulator
Special case of optimal control problem with
minimize \(\displaystyle\sum_{t=0}^{H-1} s_t^\top Qs_t +a_t^\top Ra_t+s_H^\top Q s_H\)
s.t. \(s_{t+1}=As_t+B a_t, ~~a_t=\pi_t(s_t)\)
\(\pi\)
\(a_t\)
\(Q = \begin{bmatrix}1&0\\ 0&0\end{bmatrix},\quad R=\lambda\)
$$\min_{a}\quad s^\top \begin{bmatrix}1&0\\ 0&0\end{bmatrix}s + (s')^\top \begin{bmatrix}1&0\\ 0&0\end{bmatrix}s' +\lambda a^2 \quad \text{s.t.} \quad s' = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s + \begin{bmatrix}0\\ 1\end{bmatrix}a $$
$$\min_{a}\quad (\begin{bmatrix}1&0\end{bmatrix}s)^2 + (\begin{bmatrix}1&1\end{bmatrix}s)^2 + \lambda a^2 \quad \implies a^\star = 0 $$
\(a_t\)
$$\min_{a_0, a_1}\quad s_0^\top \begin{bmatrix}1&0\\ 0&0\end{bmatrix}s_0 + s_1^\top \begin{bmatrix}1&0\\ 0&0\end{bmatrix}s_1 + s_2^\top \begin{bmatrix}1&0\\ 0&0\end{bmatrix}s_2+\lambda a_{0}^2+\lambda a_1^2 $$
$$\text{s.t.} \quad s_{1} = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s_{0} + \begin{bmatrix}0\\ 1\end{bmatrix}a_{0} , \quad \quad s_{2} = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s_{1} + \begin{bmatrix}0\\ 1\end{bmatrix}a_{1} $$
$$\min_{a_0}\quad s_0^\top \begin{bmatrix}1&0\\ 0&0\end{bmatrix}s_0 + (\begin{bmatrix}1&0\end{bmatrix}s_1)^2 + (\begin{bmatrix}1&1\end{bmatrix}s_1)^2 +\lambda a_{0}^2 $$
$$\text{s.t.} \quad s_{1} = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s_{0} + \begin{bmatrix}0\\ 1\end{bmatrix}a_{0} , \quad $$
\(a_t\)
$$ a_0^\star = -\frac{\begin{bmatrix}1&2\end{bmatrix}s_0}{1+\lambda} $$
\(a_1^\star=0\)
$$\min_{a_0}\quad s_0^\top \begin{bmatrix}1&0\\ 0&0\end{bmatrix}s_0 + s_1^\top \begin{bmatrix}2&1\\ 1 & 1\end{bmatrix}s_1 +\lambda a_{0}^2 \quad \text{s.t.} \quad s_{1} = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s_{0} + \begin{bmatrix}0\\ 1\end{bmatrix}a_{0} , \quad $$
$$\min_{a_0}\quad s_0^\top \begin{bmatrix}1&0\\ 0&0\end{bmatrix}s_0 + \left(\begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s_{0} + \begin{bmatrix}0\\ 1\end{bmatrix}a_{0}\right)^\top \begin{bmatrix}2&1\\ 1 & 1\end{bmatrix}\left(\begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s_{0} + \begin{bmatrix}0\\ 1\end{bmatrix}a_{0}\right) +\lambda a_{0} ^2$$
$$\min_{a_0}\quad s_0^\top \begin{bmatrix}1&0\\ 0&0\end{bmatrix}s_0 + s_0^\top \begin{bmatrix}2&3\\ 3&5\end{bmatrix}s_0 + 2 s_0^\top \begin{bmatrix}1\\2\end{bmatrix}a_0 + a_0^2 +\lambda a_{0}^2 $$
$$\min_{a_0}\quad s_0^\top \begin{bmatrix}3&3\\ 3&5\end{bmatrix}s_0 + 2 s_0^\top \begin{bmatrix}1\\2\end{bmatrix}a_0 + (1 +\lambda)a_0^2 \implies a_0^\star = -\frac{\begin{bmatrix}1&2\end{bmatrix}s_0}{1+\lambda} $$
Reformulating for optimal control, our general purpose dynamic programming algorithm is:
\(V^\star_{t+1}(f(s,a))\)
DP: \(V_t^\star (s) = \min_{a} c(s, a)+V_{t+1}^\star (f(s,a))\)
Theorem: For \(t=0,\dots ,H-1\), the optimal value function is quadratic and the optimal policy is linear$$V^\star_t (s) = s^\top P_t s \quad\text{ and }\quad \pi_t^\star(s) = K_t s$$
where the matrices are defined as \(P_{H} = Q\) and
\(a_t\)
\(\pi_t^\star(s) = \begin{bmatrix}{ \gamma^\mathsf{pos}_t }& {\gamma_t^\mathsf{vel}} \end{bmatrix}s\)
\(\gamma^\mathsf{pos}\)
\(\gamma^\mathsf{vel}\)
\(-1\)
\(t\)
\(H\)
By Sarah Dean