Optimal Control

ML in Feedback Sys #15

Prof Sarah Dean

Reminders

Office hours this week moved to Friday 9-10am
- cancelled next week due to travel
Feedback on final project proposal this week
Upcoming paper presentations starting next week
Project midterm update due 11/11

policy

$\pi_t:\mathcal S\to\mathcal A$

observation

$s_t$

accumulate

$\{(s_t, a_t, c_t)\}$

Action in a dynamic world

Goal: select actions $a_t$ to bring environment to low-cost states

action

$a_{t}$

$F$

$s$

Controlled systems

$$ s_{t+1} = F(s_t, a_t, w_t) $$

$F$

$s$

$a_t$

$s_t$

$w_t$

Controlled systems

$$ s_{t+1} = As_t+B a_t+ w_t$$

$A$

$s$

$a_t$

$s_t$

$w_t$

Linear System: State space $\mathcal S = \mathbb R^n$ and actions $\mathcal A=\mathbb R^m$ with dynamics defined by $A\in\mathbb R^{n\times n}$ and $B\in\mathbb R^{n\times m}$

$B$

$s_\star$ is reachable from $s_0 $ if there exists $a_{0:t-1} \in\mathcal A^t$ such that $s_{t}=s_\star$ for some $t$
system is controllable if any $s_\star\in\mathcal S$ is reachable from any $s_0 \in\mathcal S$.
a linear system is controllable if and only if $$\mathrm{rank}\Big(\underbrace{\begin{bmatrix}B&AB &\dots & A^{n-1}B\end{bmatrix}}_{\mathcal C}\Big) = n$$

Reachability & Controllability

Optimal Control & Dynamic Programming

Stochastic Optimal Control Problem

$$ \min_{\pi_{0:T}} ~~\mathbb E_w\Big[\sum_{k=0}^{T} c(s_k, \pi_k(s_k)) \Big]\quad \text{s.t}\quad s_0~~\text{given},~~s_{k+1} = F(s_k, \pi_k(s_k),w_k) $$

$\underbrace{\qquad\qquad}_{J^\pi(s_0)}$

Dynamic Programming Algorithm

Initialize $J_{T+1}^\star (s) = 0$
For $k=T,T-1,\dots,0$:
- Compute $J_k^\star (s) = \min_{a\in\mathcal A} c(s, a)+\mathbb E_w[J_{k+1}^\star (F(s,a,w))]$
- Record minimizing argument as $\pi_k^\star(s)$

Reference: Ch 1 in Dynamic Programming & Optimal Control, Vol. I by Bertsekas

Linear Quadratic Regulator

Linear dynamics: $F(s, a, w) = A s+Ba+w$
Quadratic costs: $ c(s, a) = s^\top Qs + a^\top Ra $ where $Q,R\succ 0$
Stochastic and independent noise $\mathbb E[w_k] = 0$ and $\mathbb E[w_kw_k^\top] = \sigma^2 I$

LQR Problem

$$ \min_{\pi_{0:T}} ~~\mathbb E_w\Big[\sum_{k=0}^{T} s_k^\top Qs_k + a_k^\top Ra_k \Big]\quad \text{s.t}\quad s_{k+1} = A s_k+ Ba_k+w_k $$

$$a_k=\pi_k(s_k) $$

LQR Example

$$ s_{t+1} = \begin{bmatrix} 0.9 & 0.1\\ & 0.9 \end{bmatrix}s_t + \begin{bmatrix}0\\1\end{bmatrix}a_t + w_t $$

The state is position & velocity $s=[\theta,\omega]$, input is a force $a\in\mathbb R$.

Goal: stay near origin and be energy efficient

$c(s,a) = s^\top \begin{bmatrix} 10 & \\ & 0.1 \end{bmatrix}s + 5a^2 $

LQR via DP

$k=T$: $\qquad\min_{a} s^\top Q s+a^\top Ra+0$
- $J_T^\star(s) = s^\top Q s$ and $\pi_T^\star(s) =0$
$k=T-1$: $\quad \min_{a} s^\top Q s+a^\top Ra+\mathbb E_w[(As+Ba+w)^\top Q (As+Ba+w)]$

DP: $J_k^\star (s) = \min_{a\in\mathcal A} c(s, a)+\mathbb E_w[J_{k+1}^\star (F(s,a,w))]$

$\mathbb E[(As+Ba+w)^\top Q (As+Ba+w)]$
- $=(As+Ba)^\top Q (As+Ba)+\mathbb E[ 2w^\top Q(As+Ba) + w^\top Q w]$
- $=(As+Ba)^\top Q (As+Ba)+\mathrm{tr}( Q )$

LQR via DP

$k=T$: $\qquad\min_{a} s^\top Q s+a^\top Ra+0$
- $J_T^\star(s) = s^\top Q s$ and $\pi_T^\star(s) =0$
$k=T-1$: $\quad \min_{a} s^\top (Q+A^\top QA) s+a^\top (R+B^\top QB) a+2s^\top A^\top Q Ba+\mathrm{tr}( Q )$

DP: $J_k^\star (s) = \min_{a\in\mathcal A} c(s, a)+\mathbb E_w[J_{k+1}^\star (F(s,a,w))]$

$\min_a a^\top M a + m^\top a + c$
- $2Ma_\star + m = 0 \implies a_\star = -\frac{1}{2}M^{-1} m$
$\pi_{T-1}^\star(s)=-\frac{1}{2}(R+B^\top QB)^{-1}(2B^\top QAs)$

$\mathbb E[(As+Ba+w)^\top Q (As+Ba+w)]=(As+Ba)^\top Q (As+Ba)+\mathrm{tr}( Q )$

LQR via DP

$k=T$: $\qquad\min_{a} s^\top Q s+a^\top Ra+0$
- $J_T^\star(s) = s^\top Q s$ and $\pi_T^\star(s) =0$
$k=T-1$: $\quad \min_{a} s^\top (Q+A^\top QA) s+a^\top (R+B^\top QB) a+2s^\top A^\top Q Ba+\mathrm{tr}( Q )$
- $\pi_{T-1}^\star(s)=-(R+B^\top QB)^{-1}B^\top QAs$
- $J_T^\star(s) = s^\top (Q+A^\top QA + A^\top QB(R+B^\top QB)^{-1}B^\top QA) s +\mathrm{tr}( Q )$

DP: $J_k^\star (s) = \min_{a\in\mathcal A} c(s, a)+\mathbb E_w[J_{k+1}^\star (F(s,a,w))]$

Linear Quadratic Regulator

Claim: For $t=0,\dots T$, the optimal cost-to-go function is quadratic and the optimal policy is linear

$J^\star_t (s) = s^\top P_t s + p_t$ and $\pi_t^\star(s) = K_t s$

Exercise: Using DP and induction, prove the claim for:
- $P_t = Q+A^\top P_{t+1}A + A^\top P_{t+1}B(R+B^\top P_{t+1}B)^{-1}B^\top P_{t+1}A$
- $p_t = p_{t+1} + \sigma^2\mathrm{tr}(P_{t+1})$
- $K_t = -(R+B^\top P_{t+1}B)^{-1}B^\top P_{t+1}A$
Exercise: Derive expressions for optimal controllers when
1. Time varying cost: $c_t(s,a) = s^\top Q_t s+a^\top R_t a$
2. General noise covariance: $\mathbb E[w_tw_t^\top] = \Sigma_t$
3. Trajectory tracking: $c_t(s,a) = \|s-\bar s_t\|_2^2 + \|a\|_2^2$ for given $\bar s_t$

LQR Example

$$ s_{t+1} = \begin{bmatrix} 0.9 & 0.1\\ & 0.9 \end{bmatrix}s_t + \begin{bmatrix}0\\1\end{bmatrix}a_t + w_t $$

The state is position & velocity $s=[\theta,\omega]$, input is a force $a\in\mathbb R$.

Goal: stay near origin and be energy efficient

$c(s,a) = s^\top \begin{bmatrix} 10 & \\ & 0.1 \end{bmatrix}s + 5a^2 $

$\pi_\star(s) \approx -\begin{bmatrix} 7.0\times 10^{-2}& 3.7\times 10^{-2}\end{bmatrix} s$

Convexity of Open-Loop LQR

$$ \min_{a_{0:T}} ~~\sum_{k=0}^{T} s_k^\top Qs_k + a_k^\top Ra_k \quad \text{s.t}\quad s_{k+1} = A s_k+ Ba_k $$

Quadratic cost, linear constraints $\implies$ Quadratic Program
Define $$\mathbf s = \begin{bmatrix}s_0 \\ \vdots \\ s_T\end{bmatrix},\quad \mathbf a = \begin{bmatrix}a_0 \\ \vdots \\ a_{T-1}\end{bmatrix}, \quad \bar A = \begin{bmatrix}A \\ &\ddots \\&& A\end{bmatrix}$$ and similarly for $\bar B,\bar Q,\bar R$.

Convexity of Open-Loop LQR

$$ \min_{\mathbf a} ~~\mathbf s^\top \bar Q\mathbf s + \mathbf a^\top \bar R\mathbf a\quad \text{s.t}\quad \mathbf s_{1:T} = \bar A \mathbf s_{0:T-1}+ \bar B\mathbf a $$

Quadratic cost, linear constraints $\implies$ Quadratic Program
Define $$\mathbf s = \begin{bmatrix}s_0 \\ \vdots \\ s_T\end{bmatrix},\quad \mathbf a = \begin{bmatrix}a_0 \\ \vdots \\ a_{T-1}\end{bmatrix}, \quad \bar A = \begin{bmatrix}A \\ &\ddots \\&& A\end{bmatrix}$$ and similarly for $\bar B,\bar Q,\bar R$.

$$ \min_{K_{0:T}} ~~\mathbb E_w\Big[\sum_{k=0}^{T} s_k^\top (Q + K_k^\top R K_k)s_k \Big]\quad \text{s.t}\quad s_{k+1} = (A +BK_k)s_k+w_k $$

Exercise: Prove $s_{t+1}=A_t s_t + w_t\implies $ $$s_t = (A_{t-1}A_{t-2}\cdots A_0) s_0 + \sum_{k=0}^{t-2} (A_{t-1}\cdots A_{k+1}) w_{k}+w_{t-1}$$

Non-convexity of LQR

Example: For a 1D system with $A=B=1$, $\mathbb E[s_2] = (1+K_1)(1+K_0)s_0$

System-level Reparametrization

$B$

$A$

$s$

$w_t$

$a_t$

$s_t$

$\mathbf{K}$

$ = \prod_{\ell=0}^{t}(A+BK_{\ell}) s_0 + \sum_{k=0}^{t} \prod_{\ell=k+1}^{t}(A+BK_\ell) w_{k}$

$ = K_t \prod_{\ell=0}^{t-1}(A+BK_{\ell}) s_0 + K_t \sum_{k=0}^{t-1} \prod_{\ell=k+1}^{t-1}(A+BK_\ell) w_{k}$

$s_{t+1} = As_{t}+Ba_{t}+w_{t}$

$a_t = K_t s_t$

$s_{t} = \Phi_s^{t,0} s_0 + \sum_{k=1}^t \Phi_s^{t, k}w_{t-k}$

$a_{t} = \Phi_a^{t,0} s_0 + \sum_{k=1}^t \Phi_a^{t, k}w_{t-k}$

$\mathbf{\Phi}$

System-level Reparametrization

$\mathbf{\Phi}$

$B$

$A$

$s$

$w_t$

$a_t$

$s_t$

$\mathbf{K}$

$ = \prod_{\ell=0}^{t}(A+BK_{\ell}) s_0 + \sum_{k=0}^{t} \prod_{\ell=k+1}^{t}(A+BK_\ell) w_{k}$

$ = K_t \prod_{\ell=0}^{t-1}(A+BK_{\ell}) s_0 + K_t \sum_{k=0}^{t-1} \prod_{\ell=k+1}^{t-1}(A+BK_\ell) w_{k}$

$s_{t+1} = As_{t}+Ba_{t}+w_{t}$

$a_t = K_t s_t$

$\begin{bmatrix} s_{0}\\\vdots \\s_T\end{bmatrix} = \begin{bmatrix} \Phi_s^{0,0}\\ \Phi_s^{1, 1}& \Phi_s^{1,0}\\ \vdots & \ddots & \ddots \\ \Phi_s^{T,T} & \Phi_s^{T,T-1} & \dots & \Phi_s^{T,0} \end{bmatrix} \begin{bmatrix} s_0\\w_0\\ \vdots \\w_{T-1}\end{bmatrix}$

$\begin{bmatrix} a_{0}\\\vdots \\a_T\end{bmatrix} = \begin{bmatrix} \Phi_a^{0,0}\\ \Phi_a^{1, 1}& \Phi_a^{1,0}\\ \vdots & \ddots & \ddots \\ \Phi_a^{T,T} & \Phi_a^{T,T-1} & \dots & \Phi_a^{T,0} \end{bmatrix} \begin{bmatrix} s_0\\w_0\\ \vdots \\w_{T-1}\end{bmatrix}$

$\mathbf s = \mathbf \Phi_s \mathbf w$

$\mathbf a = \mathbf \Phi_a \mathbf w$

Reparametrized objective:

$ \mathbf s^\top \bar Q\mathbf s + \mathbf a^\top \bar R\mathbf a = \mathbf w^\top (\mathbf \Phi_s^\top \bar Q \mathbf \Phi_s + \mathbf \Phi_a^\top \bar R \mathbf \Phi_a )\mathbf w$

System-level Reparametrization

$\mathbf s_{1:T} = \bar A \mathbf s_{0:T-1}+ \bar B\mathbf a+\mathbf w_{0:T-1}$

Reparametrized objective: $ \mathbf s^\top \bar Q\mathbf s + \mathbf a^\top \bar R\mathbf a = \mathbf w^\top (\mathbf \Phi_s^\top \bar Q \mathbf \Phi_s + \mathbf \Phi_a^\top \bar R \mathbf \Phi_a )\mathbf w$

Reparametrized constraints:

$\iff\quad\mathbf \Phi_s\mathbf w = \mathcal Z \bar A \mathbf \Phi_s\mathbf w + \mathcal Z \bar B \mathbf \Phi_a\mathbf w + \mathbf w $

$$\begin{bmatrix}s_0 \\ \vdots \\ s_T\end{bmatrix}= \underbrace{\begin{bmatrix}0\\ A \\ &\ddots \\&& A\end{bmatrix} }_{\mathcal Z \bar A}\begin{bmatrix}s_0 \\ \vdots \\ s_T\end{bmatrix} + \underbrace{\begin{bmatrix}0\\ B \\ &\ddots \\&& B\end{bmatrix} }_{\mathcal Z \bar B} \begin{bmatrix}a_0 \\ \vdots \\ a_{T-1}\end{bmatrix} +\begin{bmatrix} s_0\\w_0\\ \vdots \\w_{T-1}\end{bmatrix}$$

System-level Reparametrization

$s_{t+1} = As_{t}+Ba_{t}+w_{t}$

Reparametrized constraints:

$\sum_{k=0}^{t+1} \Phi_s^{t+1, k}w_{t+1-k} = A\sum_{k=0}^t \Phi_s^{t, k}w_{t-k} + B\sum_{k=0}^t \Phi_a^{t, k}w_{t-k} + w_{t}$

(let $w_{-1}=s_0$)

Claim: The above equality is implied by

$$\Phi_s^{t,0}=I,\quad \Phi_s^{t, k+1} = A \Phi_s^{t, k}+B\Phi_a^{t, k} \quad \forall ~t,~k\leq t $$

References: System Level Synthesis by Anderson, Doyle, Low, Matni

System Level Synthesis

Theorem: For the a linear system in feedback with a linear controller over the horizon $t=0,\dots, T$:

The affine subspace $\{(I - \mathcal Z \bar A )\mathbf \Phi_s- \mathcal Z \bar B \mathbf \Phi_a = I\} $ parametrizes all possible system responses.
For any block-lower-triangular matrices $(\mathbf \Phi_s,\mathbf \Phi_a)$ in the affine subspace, there exists a linear feedback controller achieving this response.

Example: For a 1D system with $A=B=1$,

$s_1 = (1 + K_0) s_0 + w_0$
$s_2 = (1+K_1)(1+K_0)s_0 + (1+K_1)w_0 + w_1$

1D Example

Suppose $K_0 = K_1=-\frac{1}{2}$. What are $\mathbf \Phi_s$ and $\mathbf \Phi_u$?
- $\mathbf \Phi_s = \begin{bmatrix} 1\\ \frac{1}{2} & 1\\ \frac{1}{4} & \frac{1}{2} & 1\end{bmatrix}$ and $\mathbf \Phi_u = \begin{bmatrix} -\frac{1}{2} \\ -\frac{1}{4}& \frac{1}{2}\end{bmatrix}$
Is there some $K_0,K_1$ such that $\mathbf \Phi_s = \begin{bmatrix} \frac{1}{2}\\ \frac{1}{4} & \frac{1}{2}\\ \frac{1}{8} & \frac{1}{4} & \frac{1}{2}\end{bmatrix}$?

System Level LQR

$$ \min_{\mathbf \Phi} ~~\mathbb E_w\Big[ \mathbf w^\top(\mathbf \Phi_s^\top \bar Q \mathbf \Phi_s+ \mathbf \Phi_a^\top \bar R \mathbf \Phi_a )\mathbf w \Big]\quad \text{s.t}\quad (I - \mathcal Z \bar A )\mathbf \Phi_s- \mathcal Z \bar B \mathbf \Phi_a = I $$

$\mathbb E_w\Big[ \mathbf w^\top(\mathbf \Phi_s^\top \bar Q \mathbf \Phi_s+ \mathbf \Phi_a^\top \bar R \mathbf \Phi_a )\mathbf w \Big]$
- $=\mathbb E_w\Big[ \mathrm{tr}((\mathbf \Phi_s^\top \bar Q \mathbf \Phi_s+ \mathbf \Phi_a^\top \bar R \mathbf \Phi_a )\mathbf w\mathbf w^\top) \Big]$
- $=\sigma^2\mathrm{tr}(\mathbf \Phi_s^\top \bar Q \mathbf \Phi_s+ \mathbf \Phi_a^\top \bar R \mathbf \Phi_a )$
- $=\sigma^2\left\|\begin{bmatrix}\bar Q^{1/2} \\&\bar R^{1/2}\end{bmatrix} \begin{bmatrix} \mathbf \Phi_s\\ \mathbf \Phi_a \end{bmatrix}\right\|_F^2$

A System Level Perspective

In closed loop, state and input are linear functions of disturbance

$x_t = \sum_{k=0}^t A^{k}(Bu_{t-k} + w_{t-k})$

$u_t = \sum_{k=0}^t K_kx_{t-k}$

$\begin{bmatrix} x_t\\u_t \end{bmatrix} = \sum_{k=0}^t \begin{bmatrix} \Phi_x(t)\\ \Phi_u(t) \end{bmatrix} w_{t-k}$

Instead of reasoning about a controller $\mathbf{K}$, we reason about the interconnection $\mathbf\Phi$ directly.

instead of a loop,

system looks like a line

$(A,B)$

$\mathbf{K}$

$\bf x$

$\bf u$

$\bf w$

$\bf x$

$\bf u$

$\bf w$

$\mathbf{\Phi}$

$u_t = \sum_{k=0}^t{\color{Goldenrod} K_k} x_{t-k}$

$ \underset{\mathbf u }{\min}$ $\displaystyle\lim_{T\to\infty}\mathbb{E}\left[ \frac{1}{T}\sum_{t=0}^T x_t^\top Q x_t + u_t^\top R u_t\right]$

$\text{s.t.}~~x_{t+1} = Ax_t + Bu_t + w_t$

$\begin{bmatrix} \mathbf x\\ \mathbf u\end{bmatrix} = \begin{bmatrix} \mathbf \Phi_x\\ \mathbf \Phi_u\end{bmatrix}\mathbf w $

$\text{s.t.}~~ {\color{teal}\mathbf\Phi }\in\mathrm{Affine}(A, B)$

$ \underset{\color{teal}\mathbf{\Phi}}{\min}$$\left\| \begin{bmatrix} Q^{1/2} &\\& R^{1/2}\end{bmatrix} \begin{bmatrix}\color{teal} \mathbf{\Phi}_x \\ \color{teal} \mathbf{\Phi}_u \end{bmatrix} \right\|_{\mathcal{H}_2}^2$

To implement resulting controller:

$w_{-1}=s_0$ and $a_0=\Phi_a^{0, 0}w_{-1}$
for $t=1, \dots, T$
- $w_{t-1} = s_{t}-As_{t-1}-Ba_{t-1}$
- $a_{t} = \sum_{k=0}^{t} \Phi_a^{t, k}w_{t-k-1}$

System Level LQR

$$ \min_{\mathbf \Phi} ~~\left\|\begin{bmatrix}\bar Q^{1/2} \\&\bar R^{1/2}\end{bmatrix} \begin{bmatrix} \mathbf \Phi_s\\ \mathbf \Phi_a \end{bmatrix}\right\|_F^2\quad \text{s.t}\quad \begin{bmatrix} I - \mathcal Z \bar A & - \mathcal Z \bar B\end{bmatrix} \begin{bmatrix} \mathbf \Phi_s\\ \mathbf \Phi_a \end{bmatrix}= I $$

Recap: System Level LQR

$u_t = {\color{Goldenrod} K_t }s_{t}$

$ \underset{\mathbf u }{\min}$ $\displaystyle\mathbb{E}\left[\sum_{t=0}^T s_t^\top Q s_t + a_t^\top R a_t\right]$

$\text{s.t.}~~s_{t+1} = As_t + Ba_t + w_t$

$\begin{bmatrix} \mathbf x\\ \mathbf u\end{bmatrix} = \begin{bmatrix} \mathbf \Phi_x\\ \mathbf \Phi_u\end{bmatrix}\mathbf w $

$ \underset{\color{teal}\mathbf{\Phi}}{\min}$$\left\| \begin{bmatrix}\bar Q^{1/2} &\\& \bar R^{1/2}\end{bmatrix} \begin{bmatrix}\color{teal} \mathbf{\Phi}_x \\ \color{teal} \mathbf{\Phi}_u \end{bmatrix} \right\|_{F}^2$

$\text{s.t.}~~ \begin{bmatrix} I - \mathcal Z \bar A & - \mathcal Z \bar B\end{bmatrix} \begin{bmatrix}\color{teal} \mathbf{\Phi}_x \\ \color{teal} \mathbf{\Phi}_u \end{bmatrix}= I $

$B$

$A$

$s$

$w_t$

$a_t$

$s_t$

$\mathbf{K}$

instead of a loop,

$\mathbf{\Phi}$

$B$

$A$

$s$

$w_t$

$a_t$

$s_t$

$\mathbf{K}$

system looks like a line

Steady State LQR

Infinite Horizon LQR Problem

$$ \min_{\pi_{0:T}} ~~\lim_{T\to\infty}\mathbb E_w\Big[\frac{1}{T}\sum_{k=0}^{T} s_k^\top Qs_k + a_k^\top Ra_k \Big]\quad \text{s.t}\quad s_{k+1} = A s_k+ Ba_k+w_k $$

Claim: The optimal cost-to-go function is quadratic and the optimal policy is linear $$J^\star (s) = s^\top P s,\qquad \pi^\star(s) = K s$$

$P = Q+A^\top PA + A^\top PB(R+B^\top PB)^{-1}B^\top PA$
- Discrete Algebraic Riccati Equation: $P=\mathrm{DARE}(A,B,Q,R)$
$K = -(R+B^\top PB)^{-1}B^\top QPA$

Steady State System Response

$B$

$A$

$s$

$w_t$

$a_t$

$s_t$

$\mathbf{K}$

$ = (A+BK)^{t+1} s_0 + \sum_{k=0}^{t} (A+BK)^{t-k} w_{k}$

$ = K(A+BK)^{t+1} s_0 + \sum_{k=0}^{t} K(A+BK)^{t-k} w_{k}$

$s_{t+1} = As_{t}+Ba_{t}+w_{t}$

$a_t = K s_t$

$s_{t} = \Phi_s^{0} s_0 + \sum_{k=1}^t \Phi_s^{k}w_{t-k}$

$a_{t} = \Phi_a^{0} s_0 + \sum_{k=1}^t \Phi_a^{k}w_{t-k}$

$\mathbf{\Phi}$

$\begin{bmatrix} s_{0}\\\vdots \\s_T\end{bmatrix} = \begin{bmatrix} \Phi_s^{0}\\ \Phi_s^{ 1}& \Phi_s^{0}\\ \vdots & \ddots & \ddots \\ \Phi_s^{T} & \Phi_s^{T-1} & \dots & \Phi_s^{0} \end{bmatrix} \begin{bmatrix} s_0\\w_0\\ \vdots \\w_{T-1}\end{bmatrix}$

$\begin{bmatrix} a_{0}\\\vdots \\a_T\end{bmatrix} = \begin{bmatrix} \Phi_a^{0}\\ \Phi_a^{1}& \Phi_a^{0}\\ \vdots & \ddots & \ddots \\ \Phi_a^{T} & \Phi_a^{T-1} & \dots & \Phi_a^{0} \end{bmatrix} \begin{bmatrix} s_0\\w_0\\ \vdots \\w_{T-1}\end{bmatrix}$

Sequences & Operators

Cost depends on the (semi-)infinite sequence $\mathbf s = (s_0, s_1, s_2,\dots)$
Generated by convolution between disturbance sequence $\mathbf w = (w_{-1}, w_0, w_1,\dots)$ and (semi-)infinite operator $\mathbf \Phi_s = (\Phi_s^0, \Phi_s^1,\dots)$
We represent this convolution with the notation $\mathbf s = \mathbf \Phi_s\mathbf w$
Concretely,
- semi-infinite vectors and Toeplitz matrices
- frequency domain

Sequences & Operators

$\mathbf s = (s_0, s_1, s_2,\dots)$, $\mathbf w = (w_{-1}, w_0, w_1,\dots)$, and $\mathbf \Phi_s = (\Phi_s^0, \Phi_s^1,\dots)$ $$\mathbf s = \mathbf \Phi_s\mathbf w$$
Concretely,
- semi-infinite vectors and Toeplitz matrices $$\begin{bmatrix} s_{0}\\\vdots \\s_t\\\vdots \end{bmatrix} = \begin{bmatrix} \Phi_s^{0}\\ \Phi_s^{ 1}& \Phi_s^{0}\\ \vdots & \ddots & \ddots \\ \Phi_s^{t} & \Phi_s^{t-1} & \dots & \Phi_s^{0} \\ \vdots & & \ddots &&\ddots \end{bmatrix} \begin{bmatrix} s_0\\w_0\\ \vdots \\w_{t-1} \\\vdots \end{bmatrix}$$
- frequency domain

Sequences & Operators

$\mathbf s = (s_0, s_1, s_2,\dots)$, $\mathbf w = (w_{-1}, w_0, w_1,\dots)$, and $\mathbf \Phi_s = (\Phi_s^0, \Phi_s^1,\dots)$ $$\mathbf s = \mathbf \Phi_s\mathbf w$$
Concretely,
- semi-infinite vectors and Toeplitz matrices
- frequency domain
  - define time shift operator $z$ such that $$z(s_0, s_1,s_2 \dots) = (s_1, s_2,\dots)$$
  - represent $\mathbf s(z) = \sum_{t=0}^\infty z^{-t}s_t$ and $\mathbf \Phi_s(z) = \sum_{t=0}^\infty z^{-t}\Phi_s^t$
  - multiplication of polynomials: $$ \mathbf \Phi_s(z) \mathbf w(z) = (\sum_{t=0}^\infty z^{-t}w_t)(\sum_{t=0}^\infty z^{-t}\Phi_s^t) = \sum_{t=0}^\infty z^{-t} \sum_{k=0}^\infty \Phi_s^k w_{t-k} $$

$u_t = {\color{Goldenrod} K}s_{t}$

$ \underset{\mathbf u }{\min}$ $\displaystyle\lim_{T\to\infty}\mathbb{E}\left[\frac{1}{T}\sum_{t=0}^T s_t^\top Q s_t + a_t^\top R a_t\right]$

$\text{s.t.}~~s_{t+1} = As_t + Ba_t + w_t$

$\begin{bmatrix} \mathbf x\\ \mathbf u\end{bmatrix} = \begin{bmatrix} \mathbf \Phi_x\\ \mathbf \Phi_u\end{bmatrix}\mathbf w $

$ \underset{\color{teal}\mathbf{\Phi}}{\min}$$\left\| \begin{bmatrix}Q^{1/2} &\\& R^{1/2}\end{bmatrix} \begin{bmatrix}\color{teal} \mathbf{\Phi}_x \\ \color{teal} \mathbf{\Phi}_u \end{bmatrix} \right\|_{\mathcal H_2}^2$

$\text{s.t.}~~ \begin{bmatrix} zI - A & - B\end{bmatrix} \begin{bmatrix}\color{teal} \mathbf{\Phi}_x \\ \color{teal} \mathbf{\Phi}_u \end{bmatrix}= I $

Infinite Horizon LQR

Exercise: Using the frequency domain notation, derive the expression for the SLS cost and constraints, where we define the norm:

$$ \|\mathbf \Phi\|_{\mathcal H_2}^2 = \sum_{t=0}^\infty \|\Phi^t\|_F^2 $$

Recap

Linear quadratic regulator
- $\pi_t^\star(s) = K_t s$
System response parametrization
- $\mathbf x = \mathbf \Phi_x \mathbf w, \quad\mathbf u = \mathbf \Phi_u \mathbf w$
Steady-state controllers and infinite horizons
- $\pi^\star(s) = Ks$

References: System Level Synthesis by Anderson, Doyle, Low, Matni and Ch 2 in Machine Learning in Feedback Systems by Sarah Dean

15 - Optimal Control - ML in Feedback Sys

By Sarah Dean

15 - Optimal Control - ML in Feedback Sys

Sarah Dean PRO

asst prof in CS at Cornell

sdean.website

Optimal Control

ML in Feedback Sys #15

Reminders

Action in a dynamic world

\(F\)

Controlled systems

\(F\)

Controlled systems

\(A\)

Reachability & Controllability

Optimal Control & Dynamic Programming

Linear Quadratic Regulator

LQR Example

LQR via DP

LQR via DP

LQR via DP

Linear Quadratic Regulator

LQR Example

Convexity of Open-Loop LQR

Convexity of Open-Loop LQR

Non-convexity of LQR

System-level Reparametrization

\(\mathbf{\Phi}\)

System-level Reparametrization

\(\mathbf{\Phi}\)

System-level Reparametrization

System-level Reparametrization

System Level Synthesis

1D Example

System Level LQR

A System Level Perspective

\(\mathbf{\Phi}\)

System Level LQR

Recap: System Level LQR

\(\mathbf{\Phi}\)

Steady State LQR

Steady State System Response

\(\mathbf{\Phi}\)

Sequences & Operators

Sequences & Operators

Sequences & Operators

Infinite Horizon LQR

Recap

15 - Optimal Control - ML in Feedback Sys

More from Sarah Dean