Prof Sarah Dean

## Reminders

• Office hours this week moved to Friday 9-10am
• cancelled next week due to travel
• Feedback on final project proposal
• Upcoming paper presentations starting next week
• Project midterm update due 11/11

## Recap: System Level LQR

$$a_t = {\color{Goldenrod} K_t }s_{t}$$

$$\underset{\mathbf a }{\min}$$   $$\displaystyle\mathbb{E}\left[\sum_{t=0}^T s_t^\top Q s_t + a_t^\top R a_t\right]$$

$$\text{s.t.}~~s_{t+1} = As_t + Ba_t + w_t$$

$$\begin{bmatrix} \mathbf s\\ \mathbf a\end{bmatrix} = \begin{bmatrix} \mathbf \Phi_s\\ \mathbf \Phi_a\end{bmatrix}\mathbf w$$

$$\underset{\color{teal}\mathbf{\Phi}}{\min}$$$$\left\| \begin{bmatrix}\bar Q^{1/2} &\\& \bar R^{1/2}\end{bmatrix} \begin{bmatrix}\color{teal} \mathbf{\Phi}_s \\ \color{teal} \mathbf{\Phi}_a \end{bmatrix} \right\|_{F}^2$$

$$\text{s.t.}~~ \begin{bmatrix} I - \mathcal Z \bar A & - \mathcal Z \bar B\end{bmatrix} \begin{bmatrix}\color{teal} \mathbf{\Phi}_s\\ \color{teal} \mathbf{\Phi}_a \end{bmatrix}= I$$

$$B$$

$$A$$

$$s$$

$$w_t$$

$$a_t$$

$$s_t$$

$$\mathbf{K}$$

# $$\mathbf{\Phi}$$

$$B$$

$$A$$

$$s$$

$$w_t$$

$$a_t$$

$$s_t$$

$$\mathbf{K}$$

system looks like a line

References: System Level Synthesis by Anderson, Doyle, Low, Matni

## System Level Synthesis

Theorem: For the a linear system in feedback with a linear controller over the horizon $$t=0,\dots, T$$:

1. The affine subspace $$\{(I - \mathcal Z \bar A )\mathbf \Phi_s- \mathcal Z \bar B \mathbf \Phi_a = I\}$$ parametrizes all possible system responses.
2. For any block-lower-triangular matrices $$(\mathbf \Phi_s,\mathbf \Phi_a)$$ in the affine subspace, there exists a linear feedback controller achieving this response.
Phi_s = cvx.Variable((T*n, T*n), name="Phi_s")
Phi_a = cvx.Variable((T*p, T*n), name="Phi_a")

# Affine dynamics constraint
constr = [Phi_s[:n, :] == np.eye(n)]
for k in range(T-1):
constr.append(Phi_s[n*(k+1):n*(k+1+1),:] == A*Phi_s[n*k:n*(k+1),:] + B*Phi_a[p*k:p*(k+1),:])
constr.append(A*Phi_s[n*(T-1):,:] + B*Phi_a[p*(T-1):,:] == 0)

cost_matrix = cvx.bmat([[Q_sqrt*Phi_s[n*k:n*(k+1), :]] for k in range(T)]
+ [[R_sqrt*Phi_a[p*k:p*(k+1), :]] for k in range(T)])
objective = cvx.norm(cost_matrix,'fro')

prob = cvx.Problem(cvx.Minimize(objective), constr)
prob.solve()
Phi_s = np.array(Phi_s.value)
Phi_a = np.array(Phi_a.value)

## Optimal Control on Arbitrary Horizons

• In many cases, task horizon is long or not pre-defined
• Allow $$T\to\infty$$
• average cost $$\min_{\pi} ~~\mathbb E_w\Big[\lim_{t\to\infty} \frac{1}{T}\sum_{k=0}^{T} c(s_k, \pi(s_k)) \Big]$$
• discounted average cost $$\min_{\pi} ~~\mathbb E_w\Big[\sum_{k=0}^{\infty} \gamma^k c(s_k, \pi(s_k)) \Big]$$
• Policy is stationary (no longer depends on time) $$\pi:\mathcal S\to\mathcal A$$

Infinite Horizon LQR Problem

$$\min_{\pi} ~~\lim_{T\to\infty}\mathbb E_w\Big[\frac{1}{T}\sum_{k=0}^{T} s_k^\top Qs_k + a_k^\top Ra_k \Big]\quad \text{s.t}\quad s_{k+1} = A s_k+ Ba_k+w_k$$

Claim:  The optimal cost-to-go function is quadratic and the optimal policy is linear $$J^\star (s) = s^\top P s,\qquad \pi^\star(s) = K s$$

• $$P = Q+A^\top PA + A^\top PB(R+B^\top PB)^{-1}B^\top PA$$
• Discrete Algebraic Riccati Equation: $$P=\mathrm{DARE}(A,B,Q,R)$$
• $$K = -(R+B^\top PB)^{-1}B^\top QPA$$

## Infinite Horizon Optimal Control

Stochastic Infinite Horizon Optimal Control Problem

$$\min_{\pi} ~~\lim_{t\to\infty} \mathbb E_w\Big[\frac{1}{T}\sum_{k=0}^{T} c(s_k, \pi(s_k)) \Big]\quad \text{s.t}\quad s_0~~\text{given},~~s_{k+1} = F(s_k, \pi(s_k),w_k)$$

$$\underbrace{\qquad\qquad}_{J^\pi(s_0)}$$

Bellman Optimality Equation

• $$\underbrace{J^\star (s)}_{\text{value function}} = \min_{a\in\mathcal A} \underbrace{c(s, a)+\mathbb E_w[J^\star (F(s,a,w))]}_{\text{state-action function}}$$

• Minimizing argument is $$\pi^\star(s)$$

Reference: Ch 1 in Dynamic Programming & Optimal Control, Vol. I by Bertsekas

## LQR Example

$$s_{t+1} = \begin{bmatrix} 0.9 & 0.1\\ & 0.9 \end{bmatrix}s_t + \begin{bmatrix}0\\1\end{bmatrix}a_t + w_t$$

The state is position & velocity $$s=[\theta,\omega]$$, input is a force $$a\in\mathbb R$$.

Goal: stay near origin and be energy efficient

• $$c(s,a) = s^\top \begin{bmatrix} 10 & \\ & 0.1 \end{bmatrix}s + 5a^2$$

$$\pi_\star(s) \approx -\begin{bmatrix} 7.0\times 10^{-2}& 3.7\times 10^{-2}\end{bmatrix} s$$

$$J^\star(s) \approx s^\top \begin{bmatrix} 33.5 & 5.8 \\ 5.8 & 2.4 \end{bmatrix} s$$

$$B$$

$$A$$

$$s$$

$$w_t$$

$$a_t$$

$$s_t$$

$$\mathbf{K}$$

$$= (A+BK)^{t+1} s_0 + \sum_{k=0}^{t} (A+BK)^{t-k} w_{k}$$

$$= K(A+BK)^{t+1} s_0 + \sum_{k=0}^{t} K(A+BK)^{t-k} w_{k}$$

$$s_{t+1} = As_{t}+Ba_{t}+w_{t}$$

$$a_t = K s_t$$

$$s_{t} = \Phi_s^{0} s_0 + \sum_{k=1}^t \Phi_s^{k}w_{t-k}$$

$$a_{t} = \Phi_a^{0} s_0 + \sum_{k=1}^t \Phi_a^{k}w_{t-k}$$

# $$\mathbf{\Phi}$$

$$\begin{bmatrix} s_{0}\\\vdots \\s_T\end{bmatrix} = \begin{bmatrix} \Phi_s^{0}\\ \Phi_s^{ 1}& \Phi_s^{0}\\ \vdots & \ddots & \ddots \\ \Phi_s^{T} & \Phi_s^{T-1} & \dots & \Phi_s^{0} \end{bmatrix} \begin{bmatrix} s_0\\w_0\\ \vdots \\w_{T-1}\end{bmatrix}$$

$$\begin{bmatrix} a_{0}\\\vdots \\a_T\end{bmatrix} = \begin{bmatrix} \Phi_a^{0}\\ \Phi_a^{1}& \Phi_a^{0}\\ \vdots & \ddots & \ddots \\ \Phi_a^{T} & \Phi_a^{T-1} & \dots & \Phi_a^{0} \end{bmatrix} \begin{bmatrix} s_0\\w_0\\ \vdots \\w_{T-1}\end{bmatrix}$$

• Cost depends on the (semi-)infinite sequence $$\mathbf s = (s_0, s_1, s_2,\dots)$$
• Generated by (semi-)infinite operator $$\mathbf \Phi_s = (\Phi_s^0, \Phi_s^1,\dots)$$ acting on disturbance sequence $$\mathbf w = (w_{-1}, w_0, w_1,\dots)$$
• the operation is a convolution $$s_{t} = \sum_{k=1}^{t+1} \Phi_s^{k-1}w_{t-k}$$
• We represent this operation with the notation $$\mathbf s = \mathbf \Phi_s\mathbf w$$
• Concretely,
• semi-infinite vectors and Toeplitz matrices
• frequency domain

## Sequences & Operators

• $$\mathbf s = (s_0, s_1, s_2,\dots)$$, $$\mathbf w = (w_{-1}, w_0, w_1,\dots)$$, and $$\mathbf \Phi_s = (\Phi_s^0, \Phi_s^1,\dots)$$ $$\mathbf s = \mathbf \Phi_s\mathbf w$$
• Concretely,
• semi-infinite vectors and Toeplitz matrices $$\begin{bmatrix} s_{0}\\\vdots \\s_t\\\vdots \end{bmatrix} = \begin{bmatrix} \Phi_s^{0}\\ \Phi_s^{ 1}& \Phi_s^{0}\\ \vdots & \ddots & \ddots \\ \Phi_s^{t} & \Phi_s^{t-1} & \dots & \Phi_s^{0} \\ \vdots & & \ddots &&\ddots \end{bmatrix} \begin{bmatrix} s_0\\w_0\\ \vdots \\w_{t-1} \\\vdots \end{bmatrix}$$
• frequency domain

## Sequences & Operators

• $$\mathbf s = (s_0, s_1, s_2,\dots)$$, $$\mathbf w = (w_{-1}, w_0, w_1,\dots)$$, and $$\mathbf \Phi_s = (\Phi_s^0, \Phi_s^1,\dots)$$ $$\mathbf s = \mathbf \Phi_s\mathbf w$$
• Concretely,
• semi-infinite vectors and Toeplitz matrices
• frequency domain
• define time shift operator $$z$$ such that $$z(s_0, s_1,s_2 \dots) = (s_1, s_2,\dots)$$
• represent $$\mathbf s(z) = \sum_{t=0}^\infty z^{-t}s_t$$ and $$\mathbf \Phi_s(z) = \sum_{t=0}^\infty z^{-t}\Phi_s^t$$
• multiplication of polynomials: $$\mathbf \Phi_s(z) \mathbf w(z) = (\sum_{t=-1}^\infty z^{-t}w_{t})(\sum_{t=0}^\infty z^{-t}\Phi_s^t) = \sum_{t=0}^\infty z^{-t} \sum_{k=1}^{t+1} \Phi_s^{k-1} w_{t-k}$$

## LQR Example

$$s_{t+1} = \begin{bmatrix} 0.9 & 0.1\\ & 0.9 \end{bmatrix}s_t + \begin{bmatrix}0\\1\end{bmatrix}a_t + w_t$$

The state is position & velocity $$s=[\theta,\omega]$$, input is a force $$a\in\mathbb R$$.

$$\pi_\star(s) \approx -\begin{bmatrix} 7.0\times 10^{-2}& 3.7\times 10^{-2}\end{bmatrix} s$$

$$\Phi_s^t \approx \begin{bmatrix}0.9 & 0.1 \\ -0.070 & 0.86\end{bmatrix}^{t-1} \quad \Phi_a^t \approx -\begin{bmatrix} 7.0\times 10^{-2}& 3.7\times 10^{-2}\end{bmatrix}\begin{bmatrix}0.9 & 0.1 \\ -0.070 & 0.86\end{bmatrix}^{t-1}$$

eigenvalues $$\approx 0.88\pm 0.082j$$

$$a_t = {\color{Goldenrod} K}s_{t}$$

$$\underset{\mathbf a }{\min}$$   $$\displaystyle\lim_{T\to\infty}\mathbb{E}\left[\frac{1}{T}\sum_{t=0}^T s_t^\top Q s_t + a_t^\top R a_t\right]$$

$$\text{s.t.}~~s_{t+1} = As_t + Ba_t + w_t$$

$$\begin{bmatrix} \mathbf s\\ \mathbf a\end{bmatrix} = \begin{bmatrix} \mathbf \Phi_s\\ \mathbf \Phi_a\end{bmatrix}\mathbf w$$

$$\underset{\color{teal}\mathbf{\Phi}}{\min}$$$$\left\| \begin{bmatrix}Q^{1/2} &\\& R^{1/2}\end{bmatrix} \begin{bmatrix}\color{teal} \mathbf{\Phi}_s \\ \color{teal} \mathbf{\Phi}_a \end{bmatrix} \right\|_{\mathcal H_2}^2$$

$$\text{s.t.}~~ \begin{bmatrix} zI - A & - B\end{bmatrix} \begin{bmatrix}\color{teal} \mathbf{\Phi}_s \\ \color{teal} \mathbf{\Phi}_a \end{bmatrix}= I$$

## Infinite Horizon LQR

Exercise: Using the frequency domain notation, derive the expression for the SLS cost and constraints. Hint: in signal notation, the dynamics can be written $$z\mathbf s = A\mathbf s + B\mathbf a + \mathbf w$$

Where we use the  norm:

$$\|\mathbf \Phi\|_{\mathcal H_2}^2 = \sum_{t=0}^\infty \|\Phi^t\|_F^2$$

## Recap: LQR

• Goal: minimize quadratic cost ($$Q,R$$) in a system with linear dynamics ($$A,B$$)
• Classic approach: Dynamic programming/Bellman optimality
• $$P = \mathrm{DARE}(A,B,Q,R)$$ and $$K_\star = -(R+B^\top PB)^{-1}B^\top QPA$$
• System level synthesis: Convex optimization
• $$\underset{\color{teal}\mathbf{\Phi}}{\min}$$$$\left\| \begin{bmatrix}Q^{1/2} &\\& R^{1/2}\end{bmatrix} \begin{bmatrix}\color{teal} \mathbf{\Phi}_s \\ \color{teal} \mathbf{\Phi}_a \end{bmatrix} \right\|_{\mathcal H_2}^2~~\text{s.t.}~~ \begin{bmatrix} zI - A & - B\end{bmatrix} \begin{bmatrix}\color{teal} \mathbf{\Phi}_s \\ \color{teal} \mathbf{\Phi}_a \end{bmatrix}= I$$

• Both require knowledge of dynamics and costs!

policy

$$\pi_t:\mathcal S\to\mathcal A$$

observation

$$s_t$$

accumulate

$$\{(s_t, a_t, c_t)\}$$

## Action in an unknown dynamic world

Goal: select actions $$a_t$$ to bring environment to low-cost states

action

$$a_{t}$$

## $$?$$

$$s$$

Setting: dynamics (and cost) functions are not known, but we have data $$\{s_k, a_k, c_k\}_{k=0}^N$$. Approaches include a focus on:

1. Model: learn dynamics/costs from data, then do policy design
• For LQR: estimate $$\hat A,\hat B,\hat Q,\hat R$$ then design $$\hat K$$
• "model based"
2. Bellman: learn value or state-action function
• For LQR: estimate $$\hat J$$ then determine $$\hat K$$ as $$\argmin$$
• "model free"
3. Policy: estimate gradients and update policy directly
• For LQR: $$\hat K \leftarrow \hat K -\alpha\widehat{\nabla J}(\hat K)$$
• "model free"

## Data-driven Policy Design

Setting: dynamics $$A,B$$ are not known, but we have data $$\{s_k, a_k\}_{k=0}^N$$

1. Learn Model:
• estimate $$\hat A,\hat B$$ via least-squares $$\hat A,\hat B = \arg\min_{A,B} \sum_{k=0}^{N-1} \|s_{k+1}-As_k-Ba_k\|_2^2$$
• error bounds  $$\|A-\hat A\|_2\leq \varepsilon_A,\quad \|B-\hat B\|_2\leq \varepsilon_B$$
• system identification guarantees $$\max\{\varepsilon_A, \varepsilon_B\}\lesssim \sqrt{\frac{m+n}{N}}$$
2. Design Policy:
• nominal or certainty equivalent approach uses $$\hat A, \hat B$$
• robust approach uses $$\hat A, \hat B, \varepsilon_A, \varepsilon_B$$

## Model-based LQR

The state is position & velocity $$s=[\theta,\omega]$$, input is a force $$a\in\mathbb R$$.

Goal: be energy efficient

• $$c(s,a) = s^\top \begin{bmatrix} 0.01& \\ & 0.01 \end{bmatrix}s + 100a^2$$

$$\hat\pi_\star(s) \approx -\begin{bmatrix} 6.1\times 10^{-5}& 2.8\times 10^{-4}\end{bmatrix} s$$ does not stabilize the system!

Even though $$\varepsilon=0.02$$, $$J(\hat K)$$ is infinite!

## LQR Example

true dynamics $$\left(\begin{bmatrix} 1.01 & 0.1\\ & 1.01 \end{bmatrix}, \begin{bmatrix}0\\1\end{bmatrix}\right)$$ but we estimate $$\left(\begin{bmatrix} 0.99 & 0.1\\ & 0.99 \end{bmatrix},\begin{bmatrix}0\\1\end{bmatrix}\right )$$

## Robust design is worst-case

$$\underset{\mathbf a=\mathbf{Ks}}{\min}$$  $$\underset{\|A-\widehat A\|\leq \varepsilon_A \atop \|B-\widehat B\|\leq \varepsilon_B}{\max}$$ $$\mathbb{E}\left[\lim_{T\to\infty} \frac{1}{T}\sum_{t=0}^T s_t^\top Q s_t + a_t^\top R a_t\right]$$

s.t.  $$s_{t+1} = As_t + Ba_t + w_t$$

Challenge: translating predictions

$$\hat s_{t+1} = \hat A\hat s_t + \hat B \hat a_t$$

to reality

$$s_{t+1} = As_t + Ba_t$$

Lemma: if the system response variables satisfy

• the nominal system constraint $$\begin{bmatrix} zI - \hat A & - \hat B\end{bmatrix} \begin{bmatrix} \hat\mathbf{\Phi}_s \\ \hat\mathbf{\Phi}_a \end{bmatrix}= I$$
• then if the inverse exists,  $$\begin{bmatrix} zI - A & - B\end{bmatrix} \begin{bmatrix} \hat\mathbf{\Phi}_s \\ \hat\mathbf{\Phi}_a \end{bmatrix} (I-\mathbf \Delta)^{-1}= I$$ where $$\mathbf \Delta = (\underbrace{A-\hat A}_{\Delta_A})\mathbf \Phi_s + (\underbrace{B-\hat B}_{\Delta_B})\mathbf \Phi_a$$

## Robust synthesis with SLS

Proof:

• $$(zI - \hat A)\hat\mathbf{\Phi}_s - \hat B \hat\mathbf{\Phi}_a = I$$
• $$(zI - \hat A+A-A)\hat\mathbf{\Phi}_s -( \hat B-B+B) \hat\mathbf{\Phi}_a=I$$
• $$(zI -A)\hat\mathbf{\Phi}_s -B\hat\mathbf{\Phi}_a + (A- \hat A)\hat\mathbf{\Phi}_s + (B-\hat B)\mathbf{\Phi}_a=I$$
• $$(zI -A)\hat\mathbf{\Phi}_s -B\hat\mathbf{\Phi}_a=I - \Delta_A\hat\mathbf{\Phi}_s - \Delta_B\mathbf{\Phi}_a$$
• $$\begin{bmatrix} zI - A & - B\end{bmatrix} \begin{bmatrix} \hat\mathbf{\Phi}_s \\ \hat\mathbf{\Phi}_a \end{bmatrix} = I-\mathbf\Delta$$

Therefore, the estimated cost is $$\hat J(\hat{\mathbf \Phi}) = \left\|\begin{bmatrix} Q^{1/2}\\ & R^{1/2}\end{bmatrix} \begin{bmatrix} \hat\mathbf{\Phi}_s \\ \hat\mathbf{\Phi}_a \end{bmatrix}\right\|_{\mathcal H_2}^2$$ while the cost actually achieved is  $$J(\hat{\mathbf \Phi}) = \left\|\begin{bmatrix} Q^{1/2}\\ & R^{1/2}\end{bmatrix} \begin{bmatrix} \hat\mathbf{\Phi}_s \\ \hat\mathbf{\Phi}_a \end{bmatrix}(I+\mathbf \Delta)^{-1} \right\|_{\mathcal H_2}^2$$

## Robust synthesis with SLS

Theorem (Anderson et al., 2019): A policy designed from systems responses satisfying  $$\begin{bmatrix} zI - \hat A & - \hat B\end{bmatrix} \begin{bmatrix} \hat\mathbf{\Phi}_s \\ \hat\mathbf{\Phi}_a \end{bmatrix}= I$$ will achieve response $$\begin{bmatrix} \hat\mathbf{\Phi}_s \\ \hat\mathbf{\Phi}_a \end{bmatrix} (I-\mathbf \Delta)^{-1}$$

where $$\mathbf \Delta = (\underbrace{A-\hat A}_{\Delta_A})\mathbf \Phi_s + (\underbrace{B-\hat B}_{\Delta_B})\mathbf \Phi_a$$ if the inverse exists.

$$\widehat{\mathbf\Phi} = \underset{\mathbf{\Phi}, {\color{teal} \gamma}}{\arg\min}$$ $$\frac{1}{1-\gamma}$$$$\left\| \begin{bmatrix} Q^{1/2} &\\& R^{1/2}\end{bmatrix} \mathbf{\Phi} \right\|_{\mathcal{H}_2}$$

$$\qquad\qquad\text{s.t.}~\begin{bmatrix}zI- \widehat A&- \widehat B\end{bmatrix} \mathbf\Phi = I$$

$$\qquad\qquad\|[{\varepsilon_A\atop ~} ~{~\atop \varepsilon_B}]\mathbf \Phi\|_{\mathcal H_\infty}\leq\gamma$$

## Robust synthesis with SLS

$$\underset{\mathbf{\Phi}}{\min}$$ $$\underset{\|\Delta_A\|\leq \varepsilon_A \atop \|\Delta_B\|\leq \varepsilon_B}{\max}$$ $$\left\| \begin{bmatrix} Q^{1/2} &\\& R^{1/2}\end{bmatrix} \mathbf{\Phi}{\color{teal}(I-\mathbf \Delta)^{-1}} \right\|_{\mathcal{H}_2}$$

$$\text{s.t.}~ {\mathbf\Phi }\in\mathrm{Affine}(\widehat A, \widehat B)$$

$$~~~~~\color{teal} \mathbf \Delta = \begin{bmatrix}\Delta_A&\Delta_B\end{bmatrix}\mathbf{\Phi}$$

$$\underset{\mathbf a=\mathbf{Ks}}{\min}$$  $$\underset{\|A-\widehat A\|\leq \varepsilon_A \atop \|B-\widehat B\|\leq \varepsilon_B}{\max}$$ $$\mathbb{E}\left[\lim_{T\to\infty} \frac{1}{T}\sum_{t=0}^T s_t^\top Q s_t + a_t^\top R a_t\right]$$

s.t.  $$s_{t+1} = As_t + Ba_t + w_t$$

Where we use the norm:

$$\|\mathbf \Phi\|_{\mathcal H_\infty} = \max_{\|\mathbf x\|_2\leq 1} \|\mathbf \Phi\mathbf x\|_2$$ induced by $$\|\mathbf x\|_2 =\sqrt{\sum_{t=0}^\infty \|\mathbf x_t\|_2^2}$$

Upper bounding this nonconvex objective leads to

## Review of matrix norms

• Euclidean norm on vectors
• $$\|x\|_2 =\sqrt{\sum_{i=1}^n x_i^2} = x^\top x$$
• Frobenius norm: how big are matrix entries
• $$\|A\|_F = \sqrt{\sum_{i=1}^n\sum_{j=1}^m A_{ij}^2} = \sqrt{\mathrm{tr}(A^\top A)}$$
• Operator norm: how big can this matrix make a vector
• $$\|A\|_2 = \max_{\|x\|_2\leq 1} \|Ax\|_2 = \sqrt{\lambda_{\max}(A^\top A)} = \sigma_{\max}(A^\top A)$$
• Relationships:
• $$\|A\|_2 \leq \|A\|_F$$
• $$\|Ax\|_2 \leq \|A\|_2\|x\|_2$$
• $$\|AB\|_F \leq \|A\|_2 \|B\|_F$$

## Signal and operator norms

• $$\ell_2$$ norm
• $$\|\mathbf x\|_2 =\sqrt{\sum_{t=0}^\infty \|\mathbf x_t\|_2^2}$$
• $$\mathcal H_2$$ norm
• $$\|\mathbf \Phi\|_{\mathcal H_2} = \sqrt{\sum_{t=0}^\infty \|\Phi^t\|_F^2}$$
• $$\mathcal H_\infty$$ norm
• $$\|\mathbf \Phi\|_{\mathcal H_\infty} = \max_{\|\mathbf x\|_2\leq 1} \|\mathbf \Phi\mathbf x\|_2$$
• Relationships:
• $$\|\mathbf \Phi\|_{\mathcal H_\infty} \leq \|\mathbf \Phi\|_{\mathcal H_2}$$
• $$\|\mathbf \Phi\mathbf x\|_2 \leq \|\mathbf \Phi\|_{\mathcal H_\infty} \|\mathbf x\|_2$$
• $$\|\mathbf \Phi \mathbf \Psi\|_{\mathcal H_2} \leq \|\mathbf \Phi \|_{\mathcal H_\infty} \|\mathbf \Psi\|_{\mathcal H_2}$$

• $$\left\| \begin{bmatrix} Q^{1/2} &\\& R^{1/2}\end{bmatrix} \mathbf{\Phi}{\color{teal}(I-\mathbf \Delta)^{-1}} \right\|_{\mathcal{H}_2} \leq \left\| \begin{bmatrix} Q^{1/2} &\\& R^{1/2} \end{bmatrix}\mathbf{\Phi}\right\|_{\mathcal{H}_2} \left\| {\color{teal}(I-\mathbf \Delta)^{-1}} \right\|_{\mathcal{H}_\infty}$$
• $$\left\| {\color{teal}(I-\mathbf \Delta)^{-1}} \right\|_{\mathcal{H}_\infty} \leq \frac{1}{1- \|\mathbf \Delta\|_{\mathcal{H}_\infty}}$$
• $$\|\begin{bmatrix}\Delta_A&\Delta_B\end{bmatrix}\mathbf{\Phi}\|_{\mathcal{H}_\infty} \leq |[{\varepsilon_A\atop ~} ~{~\atop \varepsilon_B}]\mathbf \Phi\|_{\mathcal H_\infty}$$

## Robust synthesis derivation

$$\underset{\mathbf{\Phi}}{\min}$$ $$\underset{\|\Delta_A\|\leq \varepsilon_A \atop \|\Delta_B\|\leq \varepsilon_B}{\max}$$ $$\left\| \begin{bmatrix} Q^{1/2} &\\& R^{1/2}\end{bmatrix} \mathbf{\Phi}{\color{teal}(I-\mathbf \Delta)^{-1}} \right\|_{\mathcal{H}_2}$$

$$\text{s.t.}~ {\mathbf\Phi }\in\mathrm{Affine}(\widehat A, \widehat B)$$

$$~~~~~\color{teal} \mathbf \Delta = \begin{bmatrix}\Delta_A&\Delta_B\end{bmatrix}\mathbf{\Phi}$$

$$\underset{\mathbf{\Phi}, {\color{teal} \gamma}}{\min}$$ $$\frac{1}{1-\gamma}$$$$\left\| \begin{bmatrix} Q^{1/2} &\\& R^{1/2}\end{bmatrix} \mathbf{\Phi} \right\|_{\mathcal{H}_2}$$

$$\text{s.t.}~\begin{bmatrix}zI- \widehat A&- \widehat B\end{bmatrix} \mathbf\Phi = I$$

$$\|[{\varepsilon_A\atop ~} ~{~\atop \varepsilon_B}]\mathbf \Phi\|_{\mathcal H_\infty}\leq\gamma$$

$$\widehat{\mathbf\Phi} = \underset{\mathbf{\Phi}, {\color{teal} \gamma}}{\arg\min}$$ $$\frac{1}{1-\gamma}$$$$\left\| \begin{bmatrix} Q^{1/2} &\\& R^{1/2}\end{bmatrix} \mathbf{\Phi} \right\|_{\mathcal{H}_2}$$

$$\qquad\qquad\text{s.t.}~\begin{bmatrix}zI- \widehat A&- \widehat B\end{bmatrix} \mathbf\Phi = I$$

$$\qquad\qquad\|[{\varepsilon_A\atop ~} ~{~\atop \varepsilon_B}]\mathbf \Phi\|_{\mathcal H_\infty}\leq\gamma$$

Informal Theorem (Suboptimality):

For $$\hat\mathbf{\Phi}$$ synthesized as above and $$\mathbf\Phi_\star$$ the true optimal system response,

$$J(\hat{\mathbf \Phi}) - J(\mathbf \Phi_\star) \lesssim J(\mathbf \Phi_\star)\left\|\begin{bmatrix} \varepsilon_A & \\ & \varepsilon_B\end{bmatrix} \mathbf \Phi_\star\right\|_{\mathcal H_\infty}$$

## Robust synthesis with SLS

1. Learn Model:
• estimate $$\hat A,\hat B$$ via least-squares, guarantee $$\max\{\varepsilon_A, \varepsilon_B\}\lesssim \sqrt{\frac{m+n}{N}}$$
2. Design Policy:
• robust approach uses $$\hat A, \hat B, \varepsilon_A, \varepsilon_B$$
• $$J(\hat{\mathbf \Phi}) - J(\mathbf \Phi_\star) \lesssim J(\mathbf \Phi_\star)\left\|\mathbf \Phi_\star\right\|_{\mathcal H_\infty} \sqrt{\frac{m+n}{N}}$$

• nominal or certainty equivalent approach uses $$\hat A, \hat B$$
• for small enough $$\varepsilon$$, can show that $$J(\hat{\mathbf \Phi}) - J(\mathbf \Phi_\star) \lesssim \varepsilon^2$$
• thus faster rate, $$J(\hat{\mathbf \Phi}) - J(\mathbf \Phi_\star) \lesssim \frac{m+n}{N}$$

## Model-based LQR

Using an explore then commit algorithm, we have $$R(T) = R_{\text{explore}}(N) + R_{\text{commit}}(N, T)$$

• Robust: $$R(T) \leq C_1 N + C_2 \frac{T}{\sqrt{N}}$$
• $$N\propto T^{2/3}\implies R(T)\lesssim O(T^{2/3})$$
• stability guaranteed
• Certainty equivalent: $$R(T) \leq C_1 N + C_2 \frac{T}{N}$$
• $$N\propto \sqrt{T}\implies R(T)\lesssim O(\sqrt{T})$$
• only holds for $$T$$ large enough that estimation errors are small

## Recap

• Steady-state controllers and infinite horizons
• $$\pi^\star(s) = Ks$$
• Taxonomy of RL
• policy, value, model
• Model-based LQR & Robustness

References: System Level Synthesis by Anderson, Doyle, Low, Matni and Ch 2-3 in Machine Learning in Feedback Systems by Sarah Dean

By Sarah Dean

Private