Prof. Sarah Dean
MW 2:55-4:10pm
255 Olin Hall
1. Recap
2. Linear Dynamics
3. Stability & Examples
4. Stability Theorem
\(\mathcal M = \{\mathcal{S}, \mathcal{A}, c, f, H\}\)
minimize \(\displaystyle\sum_{t=0}^{H-1} c_t(s_t, a_t)+c_H(s_H)\)
s.t. \(s_{t+1}=f(s_t, a_t), ~~a_t=\pi_t(s_t)\)
\(\pi\)
Theorem: For \(t=0,\dots ,H-1\), the optimal value function is quadratic and the optimal policy is linear$$V^\star_t (s) = s^\top P_t s \quad\text{ and }\quad \pi_t^\star(s) = K_t s$$
where the matrices are defined as \(P_{H} = Q\) and
Special case of linear dynamics & quadratic costs $$f(s,a) = As+Ba,\quad c(s,a) = s^\top Q s + a^\top R a$$
\(\pi^\star = (K_0,\dots,K_{H-1}) = \mathsf{LQR}(A,B,Q,R)\)
1. Recap
2. Linear Dynamics
3. Stability & Examples
4. Stability Theorem
Consider linear policy defined by \(a_t=Ks_t\): $$ s_{t+1} = As_t+BKs_t = (A+BK)s_t$$
\(a_t\)
\(a_t\)
\(\pi_t^\star(s) = K^\star_t s= \begin{bmatrix}{ \gamma^\mathsf{pos}_t }& {\gamma_t^\mathsf{vel}} \end{bmatrix}s\)
\(\gamma^\mathsf{pos}\)
\(\gamma^\mathsf{vel}\)
\(-1\)
\(t\)
\(H\)
\(a_t\)
Consider \(\pi(s) = \begin{bmatrix} -\frac{1}{2} &-1 \end{bmatrix}s\)
Simulations demonstrate difference between
$$ s_{t+1} = \begin{bmatrix}1 & 1 \\ -1 & 1\end{bmatrix} s_t \quad \text{vs.} \quad s_{t+1} = \begin{bmatrix}1 & 1 \\ -\frac{1}{2} & 0\end{bmatrix} s_t$$
1. Recap
2. Linear Dynamics
3. Stability & Examples
4. Stability Theorem
If \(A\) is diagonalizable, then any \(s_0\) can be written as a linear combination of eigenvectors
\(s_0 = \sum_{i=1}^{n_s} \alpha_i v_i\)
\(s_1 = A\sum_{i=1}^{n_s} \alpha_i v_i = \sum_{i=1}^{n_s} \alpha_i A v_i = \sum_{i=1}^{n_s} \alpha_i \lambda_i v_i\)
Claim: \(s_t = \sum_{i=1}^{n_s}\alpha_i \lambda_i^t v_i\)
Exercise: write the proof by induction
PollEV
You have investments in two companies.
Setting 1: Each dollar of investment in company \(i\) leads to \(\lambda_i\) returns. The companies are independent.
\(0<\lambda_2<\lambda_1<1\)
\(0<\lambda_2<1<\lambda_1\)
\(1<\lambda_2<\lambda_1\)
Setting 2: The companies are interdependent: each dollar of investment in company \(i\) leads to \(\alpha\) return for company \(i\), but it also leads to \(\beta\) return (\(i=1\)) or loss (\(i=2\)) to the other company.
\(0<\alpha^2+\beta^2<1\)
\(1<\alpha^2+\beta^2\)
$$\begin{bmatrix}1\\0\end{bmatrix} \to \begin{bmatrix}\alpha\\ \beta\end{bmatrix} $$
rotation by \(\arctan(\beta/\alpha)\)
scale by \(\sqrt{\alpha^2+\beta^2}\)
\(\lambda = \alpha \pm i \beta\)
Setting 3: Each dollar of investment in company \(i\) leads to \(\lambda\) return for company \(i\), and \(2\) is a subsidiary of \(1\) who thus accumulates its returns as well.
\(0<\lambda<1\)
\(1<\lambda\)
$$ \left(\begin{bmatrix} \lambda & \\ & \lambda\end{bmatrix} + \begin{bmatrix} & 1\\ & \end{bmatrix} \right)^t$$
$$ =\begin{bmatrix} \lambda^t & t\lambda^{t-1}\\ & \lambda^t\end{bmatrix} $$
General case: diagonalizable, real eigenvalues
Example 1: \(\displaystyle s_{t+1} = \begin{bmatrix} \lambda_1 & \\ & \lambda_2 \end{bmatrix} s_t \)
Example 2: \(\displaystyle s_{t+1} = \begin{bmatrix} \alpha & -\beta\\\beta & \alpha\end{bmatrix} s_t \)
General case: pair of complex eigenvalues
\(\lambda = \alpha \pm i \beta\)
Example 3: \(\displaystyle s_{t+1} = \begin{bmatrix} \lambda & 1\\ & \lambda\end{bmatrix} s_t \)
General case: non-diagonalizable
1. Recap
2. Linear Dynamics
3. Stability & Examples
4. Stability Theorem
Theorem: Let \(\{\lambda_i\}_{i=1}^n\subset \mathbb C\) be the eigenvalues of \(A\).
Then \(s_{t+1}=As_t\) is
\(\mathbb C\)
We call \(\max_i|\lambda_i|=1\) "marginally (un)stable"
\(0<\lambda_2<\lambda_1<1\)
\(0<\lambda_2<1<\lambda_1\)
\(1<\lambda_2<\lambda_1\)
\(\mathbb C\)
\(\mathcal R(\lambda)\)
\(\mathcal I(\lambda)\)
Trajectory is determined by the eigenstructure of \(A\)
\(s_1\)
\(s_2\)
\(\mathbb C\)
\(\mathcal R(\lambda)\)
\(\mathcal I(\lambda)\)
Trajectory is determined by the eigenstructure of \(A\)
\(s_1\)
\(s_2\)
\(\lambda = \alpha \pm i \beta\)
\(\mathbb C\)
\(\mathcal R(\lambda)\)
\(\mathcal I(\lambda)\)
Trajectory is determined by the eigenstructure of \(A\)
\(s_1\)
\(s_2\)
\(\lambda = \alpha \pm i \beta\)
\(0<\alpha^2+\beta^2<1\)
\(1<\alpha^2+\beta^2\)
\(\mathbb C\)
\(\mathcal R(\lambda)\)
\(\mathcal I(\lambda)\)
Trajectory is determined by the eigenstructure of \(A\)
\(s_1\)
\(s_2\)
\(\lambda_1 = \lambda_2=\lambda\)
\(\mathbb C\)
\(\mathcal R(\lambda)\)
\(\mathcal I(\lambda)\)
Trajectory is determined by the eigenstructure of \(A\)
\(s_1\)
\(s_2\)
\(0<\lambda<1\)
\(\lambda>1\)
\(\lambda_1 = \lambda_2=\lambda\)
Proof
If \(A\) is diagonalizable, then any \(s_0\) can be written as a linear combination of eigenvectors \(s_0 = \sum_{i=1}^{n_s} \alpha_i v_i\)
We previously argued that \(s_t = \sum_{i=1}^{n_s}\alpha_i \lambda_i^t v_i\)
We have \(\|s_t\| \leq \sum_{i=1}^{n_s}|\alpha_i| |\lambda_i|^t \|v_i\|\)
Thus \(s_t\to 0\) if and only if all \(|\lambda_i|<1\), and if any \(|\lambda_i|>1\), \(\|s_t\|\to\infty\)
Proof in the non-diagonalizable case is out of scope, but it follows using the Jordan Normal Form