CS 4/5789: Introduction to Reinforcement Learning

Lecture 8: Linear Quadratic Regulator

Prof. Sarah Dean

MW 2:45-4pm
255 Olin Hall

Example

$\min_{a_0, a_1}\quad s_0^\top \begin{bmatrix}1&0\\ 0&0\end{bmatrix}s_0 + s_1^\top \begin{bmatrix}1&0\\ 0&0\end{bmatrix}s_1 + s_2^\top \begin{bmatrix}1&0\\ 0&0\end{bmatrix}s_2+\lambda a_{0}^2+\lambda a_1^2$

$\text{s.t.} \quad s_{1} = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s_{0} + \begin{bmatrix}0\\ 1\end{bmatrix}a_{0} , \quad \quad s_{2} = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s_{1} + \begin{bmatrix}0\\ 1\end{bmatrix}a_{1}$

$\min_{a_0}\quad s_0^\top \begin{bmatrix}1&0\\ 0&0\end{bmatrix}s_0 + (\begin{bmatrix}1&0\end{bmatrix}s_1)^2 + (\begin{bmatrix}1&1\end{bmatrix}s_1)^2 +\lambda a_{0}^2$

$\text{s.t.} \quad s_{1} = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s_{0} + \begin{bmatrix}0\\ 1\end{bmatrix}a_{0} , \quad$

Setting: hovering UAV over a target
Action: thrust right/left
State: distance from target, velocity
Consider $H=2$

$a_t$

$a_0^\star = -\frac{\begin{bmatrix}1&2\end{bmatrix}s_0}{1+\lambda}$

$a_1^\star=0$

$\min_{a_0}\quad s_0^\top \begin{bmatrix}1&0\\ 0&0\end{bmatrix}s_0 + s_1^\top \begin{bmatrix}2&1\\ 1 & 1\end{bmatrix}s_1 +\lambda a_{0}^2 \quad \text{s.t.} \quad s_{1} = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s_{0} + \begin{bmatrix}0\\ 1\end{bmatrix}a_{0} , \quad$

$\min_{a_0}\quad s_0^\top \begin{bmatrix}1&0\\ 0&0\end{bmatrix}s_0 + \left(\begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s_{0} + \begin{bmatrix}0\\ 1\end{bmatrix}a_{0}\right)^\top \begin{bmatrix}2&1\\ 1 & 1\end{bmatrix}\left(\begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}s_{0} + \begin{bmatrix}0\\ 1\end{bmatrix}a_{0}\right) +\lambda a_{0} ^2$

$\min_{a_0}\quad s_0^\top \begin{bmatrix}1&0\\ 0&0\end{bmatrix}s_0 + s_0^\top \begin{bmatrix}2&3\\ 3&5\end{bmatrix}s_0 + 2 s_0^\top \begin{bmatrix}1\\2\end{bmatrix}a_0 + a_0^2 +\lambda a_{0}^2$

$\min_{a_0}\quad s_0^\top \begin{bmatrix}3&3\\ 3&5\end{bmatrix}s_0 + 2 s_0^\top \begin{bmatrix}1\\2\end{bmatrix}a_0 + (1 +\lambda)a_0^2 \implies a_0^\star = -\frac{\begin{bmatrix}1&2\end{bmatrix}s_0}{1+\lambda}$

LQR via DP

$V_H^\star(s) = s^\top Q s$
$t=H-1$ : $\quad \min_{a} s^\top Q s+a^\top Ra+ (As+Ba)^\top Q (As+Ba)$
- $\pi_{H-1}^\star(s)=-(R+B^\top QB)^{-1}B^\top QAs$
- $V_{H-1}^\star(s) = s^\top (Q+A^\top QA - A^\top QB(R+B^\top QB)^{-1}B^\top QA) s$

Theorem: For $t=0,\dots ,H-1$ , the optimal value function is quadratic and the optimal policy is linear $V^\star_t (s) = s^\top P_t s \quad\text{ and }\quad \pi_t^\star(s) = K_t s$

where the matrices are defined as $P_{H} = Q$ and

$P_t = Q+A^\top P_{t+1}A - A^\top P_{t+1}B(R+B^\top P_{t+1}B)^{-1}B^\top P_{t+1}A$
$K_t = -(R+B^\top P_{t+1}B)^{-1}B^\top P_{t+1}A$

LQR Proof

Base case: $V_H^\star(s) = s^\top Q s$
Inductive step: Assume true at $t+1$ .
DP at $t$ : $V_t^\star(s)= \min_{a} s^\top Q s+a^\top Ra+ (As+Ba)^\top P_{t+1} (As+Ba)$
- $\quad \min_{a} s^\top (Q+A^\top P_{t+1}A) s+a^\top (R+B^\top P_{t+1} B) a+2s^\top A^\top P_{t+1} Ba$
General minimization: $\arg\min_a c + a^\top M a + 2m^\top a$ gives $a_\star = -M^{-1} m$ and minimum is $c-m^\top M^{-1} m$
- $\pi_{t}^\star(s)=-(R+B^\top P_{t+1}B)^{-1}B^\top P_{t+1}As$
- $V_{t}^\star(s) = s^\top (Q+A^\top P_{t+1}A - A^\top P_{t+1}B(R+B^\top P_{t+1}B)^{-1}B^\top P_{t+1}A) s$

CS 4/5789: Introduction to Reinforcement Learning Lecture 8: Linear Quadratic Regulator Prof. Sarah Dean MW 2:45-4pm 255 Olin Hall

Sp23 CS 4/5789: Lecture 8

By Sarah Dean

Sp23 CS 4/5789: Lecture 8

2 years ago

Sarah Dean PRO

asst prof in CS at Cornell

sdean.website

CS 4/5789: Introduction to Reinforcement Learning

Lecture 8: Linear Quadratic Regulator

Reminders

Agenda

Recap: Optimal Control

Recap: Linear Dynamics

Recap: Trajectories and Stability

Recap: Trajectories and Stability

Recap: Trajectories and Stability

Recap: Trajectories and Stability

Recap: Trajectories and Stability

Recap: Stability Theorem

Stability Theorem

Marginally (un)stable

Agenda

Controlled Trajectories

Example

Linear Policy

Example

Example

Agenda

Linear Quadratic Regulator

Example

Example

Example

DP for Optimal Control

LQR via DP

LQR via DP

LQR Proof

Example

LQR Extensions

Recap

Sp23 CS 4/5789: Lecture 8

Sp23 CS 4/5789: Lecture 8

Sarah Dean PRO

CS 4/5789: Introduction to Reinforcement Learning

Lecture 8: Linear Quadratic Regulator

Sp23 CS 4/5789: Lecture 8

More from Sarah Dean