Prof. Sarah Dean

MW 2:45-4pm
255 Olin Hall

## Reminders

• Homework this week
• Programming Assignment 1 due TONIGHT
• Next PSet and PA released tonight
• PSet due next Wednesday
• PA due in 2 weeks
• My office hours:
• Tuesdays 10:30-11:30am in Gates 416A
• Wednesdays 4-4:50pm in Olin 255 (right after lecture)

## Agenda

1. Recap

2. Linear Control

## Recap: Optimal Control

• Continuous $$\mathcal S = \mathbb R^{n_s}$$ and $$\mathcal A = \mathbb R^{n_a}$$
• Cost to be minimized $$c=(c_0,\dots, c_{H-1}, c_H)$$
• Deterministic transitions described by dynamics function $$s_{t+1} = f(s_t, a_t)$$
• Finite horizon $$H$$

$$\mathcal M = \{\mathcal{S}, \mathcal{A}, c, f, H\}$$

minimize   $$\displaystyle\sum_{t=0}^{H-1} c_t(s_t, a_t)+c_H(s_H)$$

s.t.   $$s_{t+1}=f(s_t, a_t), ~~a_t=\pi_t(s_t)$$

$$\pi$$

## Recap: Linear Dynamics

• The dynamics function $$f$$ has a linear form $$s_{t+1} = As_t + Ba_t$$
• $$A$$ describes the evolution of the state when there is no action (internal dynamics) $$s_{t+1}=As_t$$

## Recap: Trajectories and Stability

$$0<\lambda_2<\lambda_1<1$$

$$0<\lambda_2<1<\lambda_1$$

$$1<\lambda_2<\lambda_1$$

$$\mathbb C$$

$$\mathcal R(\lambda)$$

$$\mathcal I(\lambda)$$

Trajectory is determined by the eigenstructure of $$A$$

$$s_1$$

$$s_2$$

## Recap: Trajectories and Stability

$$\mathbb C$$

$$\mathcal R(\lambda)$$

$$\mathcal I(\lambda)$$

Trajectory is determined by the eigenstructure of $$A$$

$$s_1$$

$$s_2$$

$$\lambda = \alpha \pm i \beta$$

$$\mathbb C$$

$$\mathcal R(\lambda)$$

$$\mathcal I(\lambda)$$

Trajectory is determined by the eigenstructure of $$A$$

$$s_1$$

$$s_2$$

$$\lambda = \alpha \pm i \beta$$

$$0<\alpha^2+\beta^2<1$$

$$1<\alpha^2+\beta^2$$

## Recap: Trajectories and Stability

$$\mathbb C$$

$$\mathcal R(\lambda)$$

$$\mathcal I(\lambda)$$

Trajectory is determined by the eigenstructure of $$A$$

$$s_1$$

$$s_2$$

## Recap: Trajectories and Stability

$$\lambda_1 = \lambda_2=\lambda$$

$$\mathbb C$$

$$\mathcal R(\lambda)$$

$$\mathcal I(\lambda)$$

Trajectory is determined by the eigenstructure of $$A$$

• depends on if $$A$$ is diagonalizable

$$s_1$$

$$s_2$$

$$0<\lambda<1$$

$$\lambda>1$$

## Recap: Trajectories and Stability

$$\lambda_1 = \lambda_2=\lambda$$

## Recap: Stability Theorem

Theorem: Let $$\{\lambda_i\}_{i=1}^n\subset \mathbb C$$ be the eigenvalues of $$A$$.
Then for $$s_{t+1}=As_t$$, the equilibrium $$s_{eq}=0$$ is

• asymptotically stable $$\iff \max_{i\in[n]}|\lambda_i|<1$$
• unstable if $$\max_{i\in[n]}|\lambda_i|> 1$$
• call $$\max_{i\in[n]}|\lambda_i|=1$$ "marginally (un)stable"

$$\mathbb C$$

## Stability Theorem

Proof

• If $$A$$ is diagonalizable, then any $$s_0$$ can be written as a linear combination of eigenvectors $$s_0 = \sum_{i=1}^{n_s} \alpha_i v_i$$

• By definition, $$Av_i = \lambda_i v_i$$

• Therefore, $$s_t = \sum_{i=1}^{n_s}\alpha_i \lambda_i^t v_i$$

• Thus $$s_t\to 0$$ if and only if all $$|\lambda_i|<1$$, and if any $$|\lambda_i|>1$$, $$\|s_t\|\to\infty$$

• Proof in the non-diagonalizable case is out of scope, but it follows using the Jordan Normal Form

## Marginally (un)stable

• We call $$\max_i|\lambda_i|=1$$ "marginally (un)stable"

• Consider independent investing example: (not unstable $$\lambda_2<1$$) $$s_{t} = \begin{bmatrix} 1 &0 \\0 & \lambda_2 \end{bmatrix}^t s_0$$
• Consider UAV example: (unstable)$$s_{t} = \begin{bmatrix} 1 & 1 \\0 & 1 \end{bmatrix}^t s_0 =\begin{bmatrix} 1 & t\\ & 1\end{bmatrix} s_0$$
• Depends on eigenvectors not just eigenvalues!

## Agenda

1. Recap

2. Linear Control

## Controlled Trajectories

• Full dynamics depend on actions $$s_{t+1} = As_t+Ba_t$$

• The trajectories can be written as (PSet 3) $$s_{t} = A^t s_0 + \sum_{k=0}^{t-1}A^k Ba_{t-k-1}$$
• The internal dynamics $$A$$ determines the long term effects of actions

## Example

• Setting: hovering UAV over a target $$s_{t+1} = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s_t + \begin{bmatrix}0\\ 1\end{bmatrix}a_t$$
• Initially at rest, then one rightward thrust followed by one leftward thrust $$a_0=1,\quad a_{t_0}=-1,\quad a_k=0~~k\notin\{0,t_0\}$$

$$a_t$$

• $$s_{t} = \displaystyle \begin{bmatrix}1 & t \\ 0 & 1\end{bmatrix}\begin{bmatrix}\mathsf{pos}_0 \\ 0 \end{bmatrix}+ \sum_{k=0}^{t-1} \begin{bmatrix}1 & k\\ 0 & 1\end{bmatrix} \begin{bmatrix}0\\ 1\end{bmatrix}a_{t-k-1}$$
• $$s_{t} = \displaystyle \begin{bmatrix}\mathsf{pos}_0 \\ 0 \end{bmatrix}+ \begin{bmatrix}1 & t-1\\ 0 & 1\end{bmatrix} \begin{bmatrix}0\\ 1\end{bmatrix}- \begin{bmatrix}1 & t-t_0-1\\ 0 & 1\end{bmatrix} \begin{bmatrix}0\\ 1\end{bmatrix}$$
• for $$t\leq t_0$$, $$s_{t} = \displaystyle \begin{bmatrix}\mathsf{pos}_0+ t-1 \\ 1 \end{bmatrix}$$ and for $$t\geq t_0$$, $$s_{t} = \displaystyle \begin{bmatrix}\mathsf{pos}_0+ t_0 \\ 0 \end{bmatrix}$$

## Linear Policy

• Linear policy defined by $$a_t=Ks_t$$: $$s_{t+1} = As_t+BKs_t = (A+BK)s_t$$

• The trajectories can be written as $$s_{t} = (A+BK)^t s_0$$
• The internal dynamics $$A$$ are modified depending on $$B$$ and $$K$$

## Example

• Setting: hovering UAV over a target $$s_{t+1} = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s_t + \begin{bmatrix}0\\ 1\end{bmatrix}a_t$$
• Thrust according to distance from target $$a_t = -(\mathsf{pos}_t- x)$$

$$a_t$$

• $$s_{t+1} - \begin{bmatrix}x\\ 0\end{bmatrix} = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}\left(s_t -\begin{bmatrix}x\\ 0\end{bmatrix}\right) + \begin{bmatrix}0\\ 1\end{bmatrix}a_t$$
• $$\left(s_{t+1} - \begin{bmatrix}x\\ 0\end{bmatrix}\right) = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}\left(s_t -\begin{bmatrix}x\\ 0\end{bmatrix}\right) + \begin{bmatrix}0\\ 1\end{bmatrix}\begin{bmatrix}-1& 0\end{bmatrix} \left(s_t -\begin{bmatrix}x\\ 0\end{bmatrix}\right)$$
• $$\left(s_{t} - \begin{bmatrix}x\\ 0\end{bmatrix}\right) = \begin{bmatrix}1 & 1 \\ -1& 1\end{bmatrix}^t\left(s_0 -\begin{bmatrix}x\\ 0\end{bmatrix}\right)$$

PollEV

## Example

• Setting: hovering UAV over a target $$s_{t+1} = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s_t + \begin{bmatrix}0\\ 1\end{bmatrix}a_t$$
• Thrust according to distance from target $$a_t = -(\mathsf{pos}_t+\mathsf{vel}_t- x)$$

$$a_t$$

• $$\left(s_{t+1} - \begin{bmatrix}x\\ 0\end{bmatrix}\right) = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}\left(s_t -\begin{bmatrix}x\\ 0\end{bmatrix}\right) + \begin{bmatrix}0\\ 1\end{bmatrix}\begin{bmatrix}-1& -1\end{bmatrix} \left(s_t -\begin{bmatrix}x\\ 0\end{bmatrix}\right)$$
• $$\left(s_{t} - \begin{bmatrix}x\\ 0\end{bmatrix}\right) = \begin{bmatrix}1 & 1 \\ -1 & 0\end{bmatrix}^t\left(s_0 -\begin{bmatrix}x\\ 0\end{bmatrix}\right)$$

## Agenda

1. Recap

2. Linear Control

Special case of optimal control problem with

• Quadratic cost $$c_t(s,a) = s^\top Qs+ a^\top Ra,\quad c_H = s^\top Qs$$
• Linear dynamics $$s_{t+1} = As_t+ Ba_t$$

minimize   $$\displaystyle\sum_{t=0}^{H-1} s_t^\top Qs_t +a_t^\top Ra_t+s_H^\top Q s_H$$

s.t.   $$s_{t+1}=As_t+B a_t, ~~a_t=\pi_t(s_t)$$

$$\pi$$

## Example

• Setting: hovering UAV over a target
• Action: thrust right/left
• State is $$s_t = \begin{bmatrix}\mathsf{position}_t - x\\ \mathsf{velocity}_t\end{bmatrix}$$
• $$c_t(s_t, a_t) = (\mathsf{position}_t-x)^2+\lambda a_t^2$$
• $$f(s_t, a_t) = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s_t + \begin{bmatrix}0\\ 1\end{bmatrix}a_t$$

$$a_t$$

$$Q = \begin{bmatrix}1&0\\ 0&0\end{bmatrix},\quad R=\lambda$$

## Example

• Setting: hovering UAV over a target
• Action: thrust right/left
• State: distance from target, velocity
• Consider $$H=1$$

$$\min_{a}\quad s^\top \begin{bmatrix}1&0\\ 0&0\end{bmatrix}s + (s')^\top \begin{bmatrix}1&0\\ 0&0\end{bmatrix}s' +\lambda a^2 \quad \text{s.t.} \quad s' = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s + \begin{bmatrix}0\\ 1\end{bmatrix}a$$

$$\min_{a}\quad (\begin{bmatrix}1&0\end{bmatrix}s)^2 + (\begin{bmatrix}1&1\end{bmatrix}s)^2 + \lambda a^2 \quad \implies a^\star = 0$$

$$a_t$$

## Example

$$\min_{a_0, a_1}\quad s_0^\top \begin{bmatrix}1&0\\ 0&0\end{bmatrix}s_0 + s_1^\top \begin{bmatrix}1&0\\ 0&0\end{bmatrix}s_1 + s_2^\top \begin{bmatrix}1&0\\ 0&0\end{bmatrix}s_2+\lambda a_{0}^2+\lambda a_1^2$$

$$\text{s.t.} \quad s_{1} = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s_{0} + \begin{bmatrix}0\\ 1\end{bmatrix}a_{0} , \quad \quad s_{2} = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s_{1} + \begin{bmatrix}0\\ 1\end{bmatrix}a_{1}$$

$$\min_{a_0}\quad s_0^\top \begin{bmatrix}1&0\\ 0&0\end{bmatrix}s_0 + (\begin{bmatrix}1&0\end{bmatrix}s_1)^2 + (\begin{bmatrix}1&1\end{bmatrix}s_1)^2 +\lambda a_{0}^2$$

$$\text{s.t.} \quad s_{1} = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s_{0} + \begin{bmatrix}0\\ 1\end{bmatrix}a_{0} , \quad$$

• Setting: hovering UAV over a target
• Action: thrust right/left
• State: distance from target, velocity
• Consider $$H=2$$

$$a_t$$

$$a_0^\star = -\frac{\begin{bmatrix}1&2\end{bmatrix}s_0}{1+\lambda}$$

$$a_1^\star=0$$

$$\min_{a_0}\quad s_0^\top \begin{bmatrix}1&0\\ 0&0\end{bmatrix}s_0 + s_1^\top \begin{bmatrix}2&1\\ 1 & 1\end{bmatrix}s_1 +\lambda a_{0}^2 \quad \text{s.t.} \quad s_{1} = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s_{0} + \begin{bmatrix}0\\ 1\end{bmatrix}a_{0} , \quad$$

$$\min_{a_0}\quad s_0^\top \begin{bmatrix}1&0\\ 0&0\end{bmatrix}s_0 + \left(\begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s_{0} + \begin{bmatrix}0\\ 1\end{bmatrix}a_{0}\right)^\top \begin{bmatrix}2&1\\ 1 & 1\end{bmatrix}\left(\begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s_{0} + \begin{bmatrix}0\\ 1\end{bmatrix}a_{0}\right) +\lambda a_{0} ^2$$

$$\min_{a_0}\quad s_0^\top \begin{bmatrix}1&0\\ 0&0\end{bmatrix}s_0 + s_0^\top \begin{bmatrix}2&3\\ 3&5\end{bmatrix}s_0 + 2 s_0^\top \begin{bmatrix}1\\2\end{bmatrix}a_0 + a_0^2 +\lambda a_{0}^2$$

$$\min_{a_0}\quad s_0^\top \begin{bmatrix}3&3\\ 3&5\end{bmatrix}s_0 + 2 s_0^\top \begin{bmatrix}1\\2\end{bmatrix}a_0 + (1 +\lambda)a_0^2 \implies a_0^\star = -\frac{\begin{bmatrix}1&2\end{bmatrix}s_0}{1+\lambda}$$

## DP for Optimal Control

Reformulating for optimal control, our general purpose dynamic programming algorithm is:

• Initialize $$V^\star_H(s) = c_H(s)$$
• For $$t=H-1, H-2, ..., 0$$:
• $$Q_t^\star(s,a) = c(s,a)+\mathbb E_{s'=f(s,a)}[V^\star_{t+1}(s')]$$
• $$\pi_t^\star(s) = \arg\min_a Q_t^\star(s,a)$$
• $$V^\star_{t}(s)=Q_t^\star(s,\pi_t^\star(s) )$$
• Return $$\pi^\star = (\pi^\star_0,\dots ,\pi^\star_{H-1})$$

$$V^\star_{t+1}(f(s,a))$$

## LQR via DP

• $$V_H^\star(s) = s^\top Q s$$
• $$t=H-1$$: $$\quad \min_{a} s^\top Q s+a^\top Ra+ (As+Ba)^\top Q (As+Ba)$$
• $$\quad \min_{a} s^\top (Q+A^\top QA) s+a^\top (R+B^\top QB) a+2s^\top A^\top Q Ba$$
• General minimization: $$\arg\min_a c + a^\top M a + 2m^\top a$$
• $$2Ma_\star + 2m = 0 \implies a_\star = -M^{-1} m$$
• $$\pi_{H-1}^\star(s)=-(R+B^\top QB)^{-1}B^\top QAs$$
• minimum is $$c-m^\top M^{-1} m$$
• $$V_{H-1}^\star(s) = s^\top (Q+A^\top QA - A^\top QB(R+B^\top QB)^{-1}B^\top QA) s$$

DP: $$V_t^\star (s) = \min_{a} c(s, a)+V_{t+1}^\star (f(s,a))$$

## LQR via DP

• $$V_H^\star(s) = s^\top Q s$$
• $$t=H-1$$: $$\quad \min_{a} s^\top Q s+a^\top Ra+ (As+Ba)^\top Q (As+Ba)$$
• $$\pi_{H-1}^\star(s)=-(R+B^\top QB)^{-1}B^\top QAs$$
• $$V_{H-1}^\star(s) = s^\top (Q+A^\top QA - A^\top QB(R+B^\top QB)^{-1}B^\top QA) s$$

Theorem:  For $$t=0,\dots ,H-1$$, the optimal value function is quadratic and the optimal policy is linear$$V^\star_t (s) = s^\top P_t s \quad\text{ and }\quad \pi_t^\star(s) = K_t s$$

where the matrices are defined as $$P_{H} = Q$$ and

• $$P_t = Q+A^\top P_{t+1}A - A^\top P_{t+1}B(R+B^\top P_{t+1}B)^{-1}B^\top P_{t+1}A$$
• $$K_t = -(R+B^\top P_{t+1}B)^{-1}B^\top P_{t+1}A$$

## LQR Proof

• Base case: $$V_H^\star(s) = s^\top Q s$$
• Inductive step: Assume true at $$t+1$$.
• DP at $$t$$: $$V_t^\star(s)= \min_{a} s^\top Q s+a^\top Ra+ (As+Ba)^\top P_{t+1} (As+Ba)$$
• $$\quad \min_{a} s^\top (Q+A^\top P_{t+1}A) s+a^\top (R+B^\top P_{t+1} B) a+2s^\top A^\top P_{t+1} Ba$$
• General minimization: $$\arg\min_a c + a^\top M a + 2m^\top a$$ gives $$a_\star = -M^{-1} m$$ and minimum is $$c-m^\top M^{-1} m$$
• $$\pi_{t}^\star(s)=-(R+B^\top P_{t+1}B)^{-1}B^\top P_{t+1}As$$
• $$V_{t}^\star(s) = s^\top (Q+A^\top P_{t+1}A - A^\top P_{t+1}B(R+B^\top P_{t+1}B)^{-1}B^\top P_{t+1}A) s$$

## Example

• Setting: hovering UAV over a target
• Action: thrust right/left
• State: distance from target, velocity
• LQR$$\left(\begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix},\begin{bmatrix}0\\ 1\end{bmatrix},\begin{bmatrix}1&0\\ 0&0\end{bmatrix},\frac{1}{2}\right)$$

$$a_t$$

$$\pi_t^\star(s) = \begin{bmatrix}{ \gamma^\mathsf{pos}_t }& {\gamma_t^\mathsf{vel}} \end{bmatrix}s$$

$$\gamma^\mathsf{pos}$$

$$\gamma^\mathsf{vel}$$

$$-1$$

$$t$$

$$H$$

## LQR Extensions

• The same dynamic programming method extends in a straightforward manner when:
1. Dynamics and costs are time varying
2. Affine term in the dynamics, cross terms in the costs
• General form: $$c_t(s,a) = s^\top Q_t s+a^\top R_t a + s^\top M_ta + m_t$$ $$f_t(s_t,a_t) = A_ts_t + B_t a_t +c_t$$
• Many applications can be reformulated this way:
• e.g. trajectory tracking $$c_t(s,a) = \|s-\bar s_t\|_2^2 + \|a\|_2^2$$ for given $$\bar s_t$$
• Next lecture: general (nonlinear) dynamics and costs

## Recap

• PA 1 due TONIGHT

• Linear Control
• LQR

• Next lecture: Nonlinear Control

By Sarah Dean

Private