Prof. Sarah Dean
MW 2:55-4:10pm
255 Olin Hall
1. Recap
2. Linear Dynamics
3. Stability & Examples
4. Stability Theorem
\(\mathcal M = \{\mathcal{S}, \mathcal{A}, c, f, H\}\)
minimize \(\displaystyle\sum_{t=0}^{H-1} c_t(s_t, a_t)+c_H(s_H)\)
s.t. \(s_{t+1}=f(s_t, a_t), ~~a_t=\pi_t(s_t)\)
Theorem: For \(t=0,\dots ,H-1\), the optimal value function is quadratic and the optimal policy is linear$$V^\star_t (s) = s^\top P_t s \quad\text{ and }\quad \pi_t^\star(s) = K_t s$$
where the matrices are defined as \(P_{H} = Q\) and
Special case of linear dynamics & quadratic costs $$f(s,a) = As+Ba,\quad c(s,a) = s^\top Q s + a^\top R a$$
\(\pi^\star = (K_0,\dots,K_{H-1}) = \mathsf{LQR}(A,B,Q,R)\)
1. Recap
2. Linear Dynamics
3. Stability & Examples
4. Stability Theorem
Consider linear policy defined by \(a_t=Ks_t\): $$ s_{t+1} = As_t+BKs_t = (A+BK)s_t$$
\(\pi_t^\star(s) = K^\star_t s= \begin{bmatrix}{ \gamma^\mathsf{pos}_t }& {\gamma_t^\mathsf{vel}} \end{bmatrix}s\)
Consider \(\pi(s) = \begin{bmatrix} -\frac{1}{2} &-1 \end{bmatrix}s\)
Simulations demonstrate difference between
$$ s_{t+1} = \begin{bmatrix}1 & 1 \\ -1 & 1\end{bmatrix} s_t \quad \text{vs.} \quad s_{t+1} = \begin{bmatrix}1 & 1 \\ -\frac{1}{2} & 0\end{bmatrix} s_t$$
1. Recap
2. Linear Dynamics
3. Stability & Examples
4. Stability Theorem
If \(A\) is diagonalizable, then any \(s_0\) can be written as a linear combination of eigenvectors
\(s_0 = \sum_{i=1}^{n_s} \alpha_i v_i\)
\(s_1 = A\sum_{i=1}^{n_s} \alpha_i v_i = \sum_{i=1}^{n_s} \alpha_i A v_i = \sum_{i=1}^{n_s} \alpha_i \lambda_i v_i\)
Claim: \(s_t = \sum_{i=1}^{n_s}\alpha_i \lambda_i^t v_i\)
Exercise: write the proof by induction
You have investments in two companies.
Setting 1: Each dollar of investment in company \(i\) leads to \(\lambda_i\) returns. The companies are independent.
Setting 2: The companies are interdependent: each dollar of investment in company \(i\) leads to \(\alpha\) return for company \(i\), but it also leads to \(\beta\) return (\(i=1\)) or loss (\(i=2\)) to the other company.
$$\begin{bmatrix}1\\0\end{bmatrix} \to \begin{bmatrix}\alpha\\ \beta\end{bmatrix} $$
rotation by \(\arctan(\beta/\alpha)\)
scale by \(\sqrt{\alpha^2+\beta^2}\)
\(\lambda = \alpha \pm i \beta\)
Setting 3: Each dollar of investment in company \(i\) leads to \(\lambda\) return for company \(i\), and \(2\) is a subsidiary of \(1\) who thus accumulates its returns as well.
$$ \left(\begin{bmatrix} \lambda & \\ & \lambda\end{bmatrix} + \begin{bmatrix} & 1\\ & \end{bmatrix} \right)^t$$
$$ =\begin{bmatrix} \lambda^t & t\lambda^{t-1}\\ & \lambda^t\end{bmatrix} $$
General case: diagonalizable, real eigenvalues
Example 1: \(\displaystyle s_{t+1} = \begin{bmatrix} \lambda_1 & \\ & \lambda_2 \end{bmatrix} s_t \)
Example 2: \(\displaystyle s_{t+1} = \begin{bmatrix} \alpha & -\beta\\\beta & \alpha\end{bmatrix} s_t \)
General case: pair of complex eigenvalues
\(\lambda = \alpha \pm i \beta\)
Example 3: \(\displaystyle s_{t+1} = \begin{bmatrix} \lambda & 1\\ & \lambda\end{bmatrix} s_t \)
General case: non-diagonalizable
1. Recap
2. Linear Dynamics
3. Stability & Examples
4. Stability Theorem
Theorem: Let \(\{\lambda_i\}_{i=1}^n\subset \mathbb C\) be the eigenvalues of \(A\).
Then \(s_{t+1}=As_t\) is
\(\mathbb C\)
We call \(\max_i|\lambda_i|=1\) "marginally (un)stable"
\(\mathbb C\)
\(\mathcal R(\lambda)\)
\(\mathcal I(\lambda)\)
Trajectory is determined by the eigenstructure of \(A\)
\(\mathbb C\)
\(\mathcal R(\lambda)\)
\(\mathcal I(\lambda)\)
Trajectory is determined by the eigenstructure of \(A\)
\(\lambda = \alpha \pm i \beta\)
\(\mathbb C\)
\(\mathcal R(\lambda)\)
\(\mathcal I(\lambda)\)
Trajectory is determined by the eigenstructure of \(A\)
\(\lambda = \alpha \pm i \beta\)
\(\mathbb C\)
\(\mathcal R(\lambda)\)
\(\mathcal I(\lambda)\)
Trajectory is determined by the eigenstructure of \(A\)
\(\lambda_1 = \lambda_2=\lambda\)
\(\mathbb C\)
\(\mathcal R(\lambda)\)
\(\mathcal I(\lambda)\)
Trajectory is determined by the eigenstructure of \(A\)
\(\lambda_1 = \lambda_2=\lambda\)
If \(A\) is diagonalizable, then any \(s_0\) can be written as a linear combination of eigenvectors \(s_0 = \sum_{i=1}^{n_s} \alpha_i v_i\)
We previously argued that \(s_t = \sum_{i=1}^{n_s}\alpha_i \lambda_i^t v_i\)
We have \(\|s_t\| \leq \sum_{i=1}^{n_s}|\alpha_i| |\lambda_i|^t \|v_i\|\)
Thus \(s_t\to 0\) if and only if all \(|\lambda_i|<1\), and if any \(|\lambda_i|>1\), \(\|s_t\|\to\infty\)
Proof in the non-diagonalizable case is out of scope, but it follows using the Jordan Normal Form