CS 4/5789: Introduction to Reinforcement Learning

Lecture 8: Linear Dynamics and Stability

Prof. Sarah Dean

MW 2:55-4:10pm
255 Olin Hall

Reminders

Homework
- Programming Assignment 1 due tonight
- PSet 3 released tonight
- PA 2 released later this week
First exam is Monday 3/4 during lecture
- If you have a conflict, post on Ed ASAP!

Agenda

1. Recap

2. Linear Dynamics

3. Stability & Examples

4. Stability Theorem

Recap: Optimal Control

Continuous $\mathcal S = \mathbb R^{n_s}$ and $\mathcal A = \mathbb R^{n_a}$
Cost to be minimized $c=(c_0,\dots, c_{H-1}, c_H)$
Deterministic transitions described by dynamics function $$s_{t+1} = f(s_t, a_t)$$
Finite horizon $H$

$\mathcal M = \{\mathcal{S}, \mathcal{A}, c, f, H\}$

minimize $\displaystyle\sum_{t=0}^{H-1} c_t(s_t, a_t)+c_H(s_H)$

s.t. $s_{t+1}=f(s_t, a_t), ~~a_t=\pi_t(s_t)$

$\pi$

Recap: LQR

Theorem: For $t=0,\dots ,H-1$, the optimal value function is quadratic and the optimal policy is linear$$V^\star_t (s) = s^\top P_t s \quad\text{ and }\quad \pi_t^\star(s) = K_t s$$

where the matrices are defined as $P_{H} = Q$ and

$P_t$ and $K_t$ in terms of $A,B,Q,R$ and $P_{t+1}$

Special case of linear dynamics & quadratic costs $$f(s,a) = As+Ba,\quad c(s,a) = s^\top Q s + a^\top R a$$

$\pi^\star = (K_0,\dots,K_{H-1}) = \mathsf{LQR}(A,B,Q,R)$

Agenda

1. Recap

2. Linear Dynamics

3. Stability & Examples

4. Stability Theorem

Linear Dynamics

Special case when dynamics $f$ has a linear form $$ s_{t+1} = As_t + Ba_t $$
$A, B\in\mathbb R^{n_s\times n_a}$ are dynamics matrices respectively describing the "internal" dynamics and the effects of actions
The trajectories can be written as (PSet 3) $$ s_{t} = A^t s_0 + \sum_{k=0}^{t-1}A^k Ba_{t-k-1} $$
Power of $A$ determines the long term effects of initial states and actions

Linear Dynamics

Special case when dynamics $f$ has a linear form $$ s_{t+1} = As_t + Ba_t $$
Consider linear policy defined by $a_t=Ks_t$: $$ s_{t+1} = As_t+BKs_t = (A+BK)s_t$$
The trajectories can be written as (PSet 3) $$ s_{t} = (A+BK)^t s_0 $$
The "internal" dynamics are modified according to $B$ and $K$

Example: naive policy

Setting: hovering UAV over a target
Action: thrust right/left
State: distance from target, velocity$$s_{t+1} = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s_t + \begin{bmatrix}0\\ 1\end{bmatrix}a_t$$
Thrust according to distance from target $a_t = -\begin{bmatrix} 1 & 0\end{bmatrix} s_t$

$a_t$

$s_{t+1} = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix} s_t + \begin{bmatrix}0\\ 1\end{bmatrix}a_t$
$s_{t+1} = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix} s_t + \begin{bmatrix}0\\ 1\end{bmatrix}\begin{bmatrix}-1& 0\end{bmatrix} s_t $

$ = \begin{bmatrix}1 & 1 \\ -1 & 1\end{bmatrix} s_t $

Example: optimal policy

Setting: hovering UAV over a target
Action: thrust right/left
State: distance from target, velocity
LQR$\left(\begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix},\begin{bmatrix}0\\ 1\end{bmatrix},\begin{bmatrix}1&0\\ 0&0\end{bmatrix},\frac{1}{2}\right)$

$a_t$

$\pi_t^\star(s) = K^\star_t s= \begin{bmatrix}{ \gamma^\mathsf{pos}_t }& {\gamma_t^\mathsf{vel}} \end{bmatrix}s$

$\gamma^\mathsf{pos}$

$\gamma^\mathsf{vel}$

$-1$

$t$

$H$

Example: approx. optimal policy

Setting: hovering UAV over a target
Action: thrust right/left
State: distance from target, velocity
$\approx$ LQR$\left(\begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix},\begin{bmatrix}0\\ 1\end{bmatrix},\begin{bmatrix}1&0\\ 0&0\end{bmatrix},\frac{1}{2}\right)$

$a_t$

Consider $\pi(s) = \begin{bmatrix} -\frac{1}{2} &-1 \end{bmatrix}s$

$s_{t+1} = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix} s_t + \begin{bmatrix}0\\ 1\end{bmatrix}\begin{bmatrix}-\frac{1}{2}& -1\end{bmatrix} s_t $
$s_{t+1} = \begin{bmatrix}1 & 1 \\ -\frac{1}{2} & 0\end{bmatrix} s_t $

Simulations demonstrate difference between

$$ s_{t+1} = \begin{bmatrix}1 & 1 \\ -1 & 1\end{bmatrix} s_t \quad \text{vs.} \quad s_{t+1} = \begin{bmatrix}1 & 1 \\ -\frac{1}{2} & 0\end{bmatrix} s_t$$

What is the difference?
What causes this difference?

Example: comparison

Agenda

1. Recap

2. Linear Dynamics

3. Stability & Examples

4. Stability Theorem

Stability of Linear Dynamics

For the dynamics $$ s_{t+1} = As_{t} ,\quad s_0\neq 0, $$
1. $s_t\to 0$, which is called asymptotically stable
2. $\|s_t\|\to\infty$, which is called unstable
3. something else (e.g. $A=I$)
Since we know $s_t = A^t s_0$, the stability is determined by the matrix $A$

Diagonalizable dynamics

Our goal is to understand what happens when we raise $A$ to the $t^{th}$ power like in $s_t = A^t s_0$
If $A$ is diagonalizable, then any $s_0$ can be written as a linear combination of eigenvectors
- $s_0 = \sum_{i=1}^{n_s} \alpha_i v_i$
- $s_1 = A\sum_{i=1}^{n_s} \alpha_i v_i = \sum_{i=1}^{n_s} \alpha_i A v_i = \sum_{i=1}^{n_s} \alpha_i \lambda_i v_i$
- Claim: $s_t = \sum_{i=1}^{n_s}\alpha_i \lambda_i^t v_i$
  - Exercise: write the proof by induction
Another perspective on why eigenvalues matter: $A^t = (VDV^{-1})^t = VD^tV^{-1}$

PollEV

Example: investing

You have investments in two companies.

Setting 1: Each dollar of investment in company $i$ leads to $\lambda_i$ returns. The companies are independent.

$\displaystyle s_{t+1} = \begin{bmatrix} \lambda_1 & \\ & \lambda_2 \end{bmatrix} s_t $

$0<\lambda_2<\lambda_1<1$

$0<\lambda_2<1<\lambda_1$

$1<\lambda_2<\lambda_1$

Example: investing

Setting 2: The companies are interdependent: each dollar of investment in company $i$ leads to $\alpha$ return for company $i$, but it also leads to $\beta$ return ($i=1$) or loss ($i=2$) to the other company.

$\displaystyle s_{t+1} = \begin{bmatrix} \alpha & -\beta \\ \beta & \alpha \end{bmatrix} s_t $

$0<\alpha^2+\beta^2<1$

$1<\alpha^2+\beta^2$

$$\begin{bmatrix}1\\0\end{bmatrix} \to \begin{bmatrix}\alpha\\ \beta\end{bmatrix} $$

rotation by $\arctan(\beta/\alpha)$

scale by $\sqrt{\alpha^2+\beta^2}$

$\lambda = \alpha \pm i \beta$

Example: investing

Setting 3: Each dollar of investment in company $i$ leads to $\lambda$ return for company $i$, and $2$ is a subsidiary of $1$ who thus accumulates its returns as well.

$\displaystyle s_{t+1} = \begin{bmatrix} \lambda & 1 \\ 0 & \lambda \end{bmatrix} s_t $

$0<\lambda<1$

$1<\lambda$

$$ \left(\begin{bmatrix} \lambda & \\ & \lambda\end{bmatrix} + \begin{bmatrix} & 1\\ & \end{bmatrix} \right)^t$$

$$ =\begin{bmatrix} \lambda^t & t\lambda^{t-1}\\ & \lambda^t\end{bmatrix} $$

Summary of 2D Examples

General case: diagonalizable, real eigenvalues

Example 1: $\displaystyle s_{t+1} = \begin{bmatrix} \lambda_1 & \\ & \lambda_2 \end{bmatrix} s_t $

Example 2: $\displaystyle s_{t+1} = \begin{bmatrix} \alpha & -\beta\\\beta & \alpha\end{bmatrix} s_t $

General case: pair of complex eigenvalues

$\lambda = \alpha \pm i \beta$

Example 3: $\displaystyle s_{t+1} = \begin{bmatrix} \lambda & 1\\ & \lambda\end{bmatrix} s_t $

General case: non-diagonalizable

Agenda

1. Recap

2. Linear Dynamics

3. Stability & Examples

4. Stability Theorem

Stability Theorem

Theorem: Let $\{\lambda_i\}_{i=1}^n\subset \mathbb C$ be the eigenvalues of $A$.
Then $s_{t+1}=As_t$ is

asymptotically stable $\iff \max_{i\in[n]}|\lambda_i|<1$
unstable if $\max_{i\in[n]}|\lambda_i|> 1$
call $\max_{i\in[n]}|\lambda_i|=1$ "marginally (un)stable"

$\mathbb C$

Marginally (un)stable

We call $\max_i|\lambda_i|=1$ "marginally (un)stable"
Consider independent investing example: $$ s_{t} = \begin{bmatrix} 1 &0 \\0 & 1\end{bmatrix}^t s_0 $$
Consider UAV example: (unstable)$$s_{t} = \begin{bmatrix} 1 & 1 \\0 & 1 \end{bmatrix}^t s_0 =\begin{bmatrix} 1 & t\\ & 1\end{bmatrix} s_0 $$
Depends on eigenvectors not just eigenvalues!

Recall: 2D Examples

$0<\lambda_2<\lambda_1<1$

$0<\lambda_2<1<\lambda_1$

$1<\lambda_2<\lambda_1$

$\mathbb C$

$\mathcal R(\lambda)$

$\mathcal I(\lambda)$

Trajectory is determined by the eigenstructure of $A$

$s_1$

$s_2$

$\mathbb C$

$\mathcal R(\lambda)$

$\mathcal I(\lambda)$

Trajectory is determined by the eigenstructure of $A$

$s_1$

$s_2$

$\lambda = \alpha \pm i \beta$

Recall: 2D Examples

$\mathbb C$

$\mathcal R(\lambda)$

$\mathcal I(\lambda)$

Trajectory is determined by the eigenstructure of $A$

$s_1$

$s_2$

$\lambda = \alpha \pm i \beta$

$0<\alpha^2+\beta^2<1$

$1<\alpha^2+\beta^2$

Recall: 2D Examples

$\mathbb C$

$\mathcal R(\lambda)$

$\mathcal I(\lambda)$

Trajectory is determined by the eigenstructure of $A$

$s_1$

$s_2$

$\lambda_1 = \lambda_2=\lambda$

Recall: 2D Examples

$\mathbb C$

$\mathcal R(\lambda)$

$\mathcal I(\lambda)$

Trajectory is determined by the eigenstructure of $A$

depends on if $A$ is diagonalizable

$s_1$

$s_2$

$0<\lambda<1$

$\lambda>1$

$\lambda_1 = \lambda_2=\lambda$

Recall: 2D Examples

Stability Theorem

Proof

If $A$ is diagonalizable, then any $s_0$ can be written as a linear combination of eigenvectors $s_0 = \sum_{i=1}^{n_s} \alpha_i v_i$
- We previously argued that $s_t = \sum_{i=1}^{n_s}\alpha_i \lambda_i^t v_i$
- We have $\|s_t\| \leq \sum_{i=1}^{n_s}|\alpha_i| |\lambda_i|^t \|v_i\|$
- Thus $s_t\to 0$ if and only if all $|\lambda_i|<1$, and if any $|\lambda_i|>1$, $\|s_t\|\to\infty$
Proof in the non-diagonalizable case is out of scope, but it follows using the Jordan Normal Form

Recap

PA 1 due tonight

Linear Dynamics
Stability

Next lecture: Locally Linear Control

Sp24 CS 4/5789: Lecture 8

By Sarah Dean

Sp24 CS 4/5789: Lecture 8

Sarah Dean PRO

asst prof in CS at Cornell

sdean.website

CS 4/5789: Introduction to Reinforcement Learning

Lecture 8: Linear Dynamics and Stability

Reminders

Agenda

Recap: Optimal Control

Recap: LQR

Agenda

Linear Dynamics

Linear Dynamics

Example: naive policy

Example: optimal policy

Example: approx. optimal policy

Example: comparison

Agenda

Stability of Linear Dynamics

Diagonalizable dynamics

Example: investing

Example: investing

Example: investing

Summary of 2D Examples

Agenda

Stability Theorem

Marginally (un)stable

Recall: 2D Examples

Recall: 2D Examples

Recall: 2D Examples

Recall: 2D Examples

Recall: 2D Examples

Stability Theorem

Recap

Sp24 CS 4/5789: Lecture 8

More from Sarah Dean