Optimal Control

ML in Feedback Sys #15

Prof Sarah Dean

Reminders

  • Office hours this week moved to Friday 9-10am
    • cancelled next week due to travel
  • Feedback on final project proposal this week
  • Upcoming paper presentations starting next week
  • Project midterm update due 11/11

policy

\(\pi_t:\mathcal S\to\mathcal A\)

observation

\(s_t\)

accumulate

\(\{(s_t, a_t, c_t)\}\)

Action in a dynamic world

Goal: select actions \(a_t\) to bring environment to low-cost states

action

\(a_{t}\)

\(F\)

\(s\)

Controlled systems

$$ s_{t+1} = F(s_t, a_t, w_t) $$

\(F\)

\(s\)

\(a_t\)

\(s_t\)

\(w_t\)

Controlled systems

$$ s_{t+1} = As_t+B a_t+ w_t$$

\(A\)

\(s\)

\(a_t\)

\(s_t\)

\(w_t\)

Linear System: State space \(\mathcal S = \mathbb R^n\) and actions \(\mathcal A=\mathbb R^m\) with dynamics defined by  \(A\in\mathbb R^{n\times n}\) and \(B\in\mathbb R^{n\times m}\)

\(B\)

  • \(s_\star\) is reachable from \(s_0 \) if there exists \(a_{0:t-1} \in\mathcal A^t\) such that \(s_{t}=s_\star\) for some \(t\)
  • system is controllable if any \(s_\star\in\mathcal S\) is reachable from any \(s_0 \in\mathcal S\).

  • a linear system is controllable if and only if $$\mathrm{rank}\Big(\underbrace{\begin{bmatrix}B&AB &\dots & A^{n-1}B\end{bmatrix}}_{\mathcal C}\Big) = n$$

Reachability & Controllability

Optimal Control & Dynamic Programming

Stochastic Optimal Control Problem

$$ \min_{\pi_{0:T}} ~~\mathbb E_w\Big[\sum_{k=0}^{T} c(s_k, \pi_k(s_k)) \Big]\quad \text{s.t}\quad s_0~~\text{given},~~s_{k+1} = F(s_k, \pi_k(s_k),w_k) $$

\(\underbrace{\qquad\qquad}_{J^\pi(s_0)}\)

Dynamic Programming Algorithm

  • Initialize \(J_{T+1}^\star (s) = 0\)
  • For \(k=T,T-1,\dots,0\):
    • Compute \(J_k^\star (s) = \min_{a\in\mathcal A} c(s, a)+\mathbb E_w[J_{k+1}^\star (F(s,a,w))]\)
    • Record minimizing argument as \(\pi_k^\star(s)\)

Reference: Ch 1 in Dynamic Programming & Optimal Control, Vol. I by Bertsekas

Linear Quadratic Regulator

  • Linear dynamics: \(F(s, a, w) = A s+Ba+w\)
  • Quadratic costs: \( c(s, a) = s^\top Qs + a^\top Ra \) where \(Q,R\succ 0\)
  • Stochastic and independent noise \(\mathbb E[w_k] = 0\) and \(\mathbb E[w_kw_k^\top] = \sigma^2 I\)

LQR Problem

$$ \min_{\pi_{0:T}} ~~\mathbb E_w\Big[\sum_{k=0}^{T} s_k^\top Qs_k + a_k^\top Ra_k \Big]\quad \text{s.t}\quad s_{k+1} = A s_k+ Ba_k+w_k $$

$$a_k=\pi_k(s_k) $$

LQR Example

$$ s_{t+1} = \begin{bmatrix} 0.9 & 0.1\\ & 0.9 \end{bmatrix}s_t + \begin{bmatrix}0\\1\end{bmatrix}a_t + w_t $$

The state is position & velocity \(s=[\theta,\omega]\), input is a force \(a\in\mathbb R\).

Goal: stay near origin and be energy efficient

  • \(c(s,a) = s^\top \begin{bmatrix} 10 & \\ & 0.1 \end{bmatrix}s + 5a^2 \)

LQR via DP

  • \(k=T\): \(\qquad\min_{a} s^\top Q s+a^\top Ra+0\)
    • \(J_T^\star(s) = s^\top Q s\) and \(\pi_T^\star(s) =0\)
  • \(k=T-1\): \(\quad \min_{a} s^\top Q s+a^\top Ra+\mathbb E_w[(As+Ba+w)^\top Q (As+Ba+w)]\)

DP: \(J_k^\star (s) = \min_{a\in\mathcal A} c(s, a)+\mathbb E_w[J_{k+1}^\star (F(s,a,w))]\)

  • \(\mathbb E[(As+Ba+w)^\top Q (As+Ba+w)]\)
    • \(=(As+Ba)^\top Q (As+Ba)+\mathbb E[ 2w^\top Q(As+Ba) + w^\top Q w]\)
    • \(=(As+Ba)^\top Q (As+Ba)+\mathrm{tr}( Q )\)

LQR via DP

  • \(k=T\): \(\qquad\min_{a} s^\top Q s+a^\top Ra+0\)
    • \(J_T^\star(s) = s^\top Q s\) and \(\pi_T^\star(s) =0\)
  • \(k=T-1\): \(\quad \min_{a} s^\top (Q+A^\top QA) s+a^\top (R+B^\top QB) a+2s^\top A^\top Q Ba+\mathrm{tr}( Q )\)

DP: \(J_k^\star (s) = \min_{a\in\mathcal A} c(s, a)+\mathbb E_w[J_{k+1}^\star (F(s,a,w))]\)

  • \(\min_a a^\top M a + m^\top a + c\)
    • \(2Ma_\star + m = 0 \implies a_\star = -\frac{1}{2}M^{-1} m\)
  • \(\pi_{T-1}^\star(s)=-\frac{1}{2}(R+B^\top QB)^{-1}(2B^\top QAs)\)
  • \(\mathbb E[(As+Ba+w)^\top Q (As+Ba+w)]=(As+Ba)^\top Q (As+Ba)+\mathrm{tr}( Q )\)

LQR via DP

  • \(k=T\): \(\qquad\min_{a} s^\top Q s+a^\top Ra+0\)
    • \(J_T^\star(s) = s^\top Q s\) and \(\pi_T^\star(s) =0\)
  • \(k=T-1\): \(\quad \min_{a} s^\top (Q+A^\top QA) s+a^\top (R+B^\top QB) a+2s^\top A^\top Q Ba+\mathrm{tr}( Q )\)
    • \(\pi_{T-1}^\star(s)=-(R+B^\top QB)^{-1}B^\top QAs\)
    • \(J_T^\star(s) = s^\top (Q+A^\top QA + A^\top QB(R+B^\top QB)^{-1}B^\top QA) s +\mathrm{tr}( Q )\)

DP: \(J_k^\star (s) = \min_{a\in\mathcal A} c(s, a)+\mathbb E_w[J_{k+1}^\star (F(s,a,w))]\)

Linear Quadratic Regulator

Claim:  For \(t=0,\dots T\), the optimal cost-to-go function is quadratic and the optimal policy is linear

  • \(J^\star_t (s) = s^\top P_t s + p_t\) and \(\pi_t^\star(s) = K_t s\)
  • Exercise: Using DP and induction, prove the claim for:
    • \(P_t = Q+A^\top P_{t+1}A + A^\top P_{t+1}B(R+B^\top P_{t+1}B)^{-1}B^\top P_{t+1}A\)
    • \(p_t = p_{t+1} + \sigma^2\mathrm{tr}(P_{t+1})\)
    • \(K_t = -(R+B^\top P_{t+1}B)^{-1}B^\top P_{t+1}A\)
  • Exercise: Derive expressions for optimal controllers when
    1. Time varying cost: \(c_t(s,a) = s^\top Q_t s+a^\top R_t a\)
    2. General noise covariance: \(\mathbb E[w_tw_t^\top] = \Sigma_t\)
    3. Trajectory tracking: \(c_t(s,a) = \|s-\bar s_t\|_2^2 + \|a\|_2^2\) for given \(\bar s_t\)

LQR Example

$$ s_{t+1} = \begin{bmatrix} 0.9 & 0.1\\ & 0.9 \end{bmatrix}s_t + \begin{bmatrix}0\\1\end{bmatrix}a_t + w_t $$

The state is position & velocity \(s=[\theta,\omega]\), input is a force \(a\in\mathbb R\).

Goal: stay near origin and be energy efficient

  • \(c(s,a) = s^\top \begin{bmatrix} 10 & \\ & 0.1 \end{bmatrix}s + 5a^2 \)

\(\pi_\star(s) \approx -\begin{bmatrix} 7.0\times 10^{-2}& 3.7\times 10^{-2}\end{bmatrix} s\)

Convexity of Open-Loop LQR

$$ \min_{a_{0:T}} ~~\sum_{k=0}^{T} s_k^\top Qs_k + a_k^\top Ra_k \quad \text{s.t}\quad s_{k+1} = A s_k+ Ba_k $$

  • Quadratic cost, linear constraints \(\implies\) Quadratic Program
  • Define $$\mathbf s = \begin{bmatrix}s_0 \\ \vdots \\ s_T\end{bmatrix},\quad \mathbf a = \begin{bmatrix}a_0 \\ \vdots \\ a_{T-1}\end{bmatrix}, \quad \bar A = \begin{bmatrix}A \\ &\ddots \\&& A\end{bmatrix}$$ and similarly for \(\bar B,\bar Q,\bar R\).

Convexity of Open-Loop LQR

$$ \min_{\mathbf a} ~~\mathbf s^\top \bar Q\mathbf s + \mathbf a^\top \bar R\mathbf a\quad \text{s.t}\quad \mathbf s_{1:T} = \bar A \mathbf s_{0:T-1}+ \bar B\mathbf a $$

  • Quadratic cost, linear constraints \(\implies\) Quadratic Program
  • Define $$\mathbf s = \begin{bmatrix}s_0 \\ \vdots \\ s_T\end{bmatrix},\quad \mathbf a = \begin{bmatrix}a_0 \\ \vdots \\ a_{T-1}\end{bmatrix}, \quad \bar A = \begin{bmatrix}A \\ &\ddots \\&& A\end{bmatrix}$$ and similarly for \(\bar B,\bar Q,\bar R\).

$$ \min_{K_{0:T}} ~~\mathbb E_w\Big[\sum_{k=0}^{T} s_k^\top (Q + K_k^\top R K_k)s_k \Big]\quad \text{s.t}\quad s_{k+1} = (A +BK_k)s_k+w_k $$

Exercise: Prove \(s_{t+1}=A_t s_t + w_t\implies \) $$s_t = (A_{t-1}A_{t-2}\cdots A_0) s_0 + \sum_{k=0}^{t-2} (A_{t-1}\cdots A_{k+1}) w_{k}+w_{t-1}$$

Non-convexity of LQR

Example: For a 1D system with \(A=B=1\), \(\mathbb E[s_2] = (1+K_1)(1+K_0)s_0\)

System-level Reparametrization

\(B\)

\(A\)

\(s\)

\(w_t\)

\(a_t\)

\(s_t\)

\(\mathbf{K}\)

\( = \prod_{\ell=0}^{t}(A+BK_{\ell}) s_0 + \sum_{k=0}^{t} \prod_{\ell=k+1}^{t}(A+BK_\ell) w_{k}\)

\( = K_t \prod_{\ell=0}^{t-1}(A+BK_{\ell}) s_0 + K_t \sum_{k=0}^{t-1} \prod_{\ell=k+1}^{t-1}(A+BK_\ell) w_{k}\)

\(s_{t+1} = As_{t}+Ba_{t}+w_{t}\)

\(a_t = K_t s_t\)

\(s_{t} = \Phi_s^{t,0} s_0 + \sum_{k=1}^t \Phi_s^{t, k}w_{t-k}\)

\(a_{t} = \Phi_a^{t,0} s_0 + \sum_{k=1}^t \Phi_a^{t, k}w_{t-k}\)


\(\mathbf{\Phi}\)

 

System-level Reparametrization


\(\mathbf{\Phi}\)

 

\(B\)

\(A\)

\(s\)

\(w_t\)

\(a_t\)

\(s_t\)

\(\mathbf{K}\)

\( = \prod_{\ell=0}^{t}(A+BK_{\ell}) s_0 + \sum_{k=0}^{t} \prod_{\ell=k+1}^{t}(A+BK_\ell) w_{k}\)

\( = K_t \prod_{\ell=0}^{t-1}(A+BK_{\ell}) s_0 + K_t \sum_{k=0}^{t-1} \prod_{\ell=k+1}^{t-1}(A+BK_\ell) w_{k}\)

\(s_{t+1} = As_{t}+Ba_{t}+w_{t}\)

\(a_t = K_t s_t\)

\(\begin{bmatrix} s_{0}\\\vdots \\s_T\end{bmatrix} = \begin{bmatrix} \Phi_s^{0,0}\\ \Phi_s^{1, 1}& \Phi_s^{1,0}\\ \vdots  & \ddots & \ddots \\ \Phi_s^{T,T} & \Phi_s^{T,T-1} & \dots & \Phi_s^{T,0} \end{bmatrix} \begin{bmatrix} s_0\\w_0\\ \vdots \\w_{T-1}\end{bmatrix}\)

\(\begin{bmatrix} a_{0}\\\vdots \\a_T\end{bmatrix} = \begin{bmatrix} \Phi_a^{0,0}\\ \Phi_a^{1, 1}& \Phi_a^{1,0}\\ \vdots  & \ddots & \ddots \\ \Phi_a^{T,T} & \Phi_a^{T,T-1} & \dots & \Phi_a^{T,0} \end{bmatrix} \begin{bmatrix} s_0\\w_0\\ \vdots \\w_{T-1}\end{bmatrix}\)

 

\(\mathbf s = \mathbf \Phi_s \mathbf w\)

\(\mathbf a = \mathbf \Phi_a \mathbf w\)

Reparametrized objective:

\( \mathbf s^\top \bar Q\mathbf s + \mathbf a^\top \bar R\mathbf a = \mathbf w^\top (\mathbf \Phi_s^\top \bar Q \mathbf \Phi_s  + \mathbf \Phi_a^\top \bar R \mathbf \Phi_a )\mathbf w\)

System-level Reparametrization

\(\mathbf s_{1:T} = \bar A \mathbf s_{0:T-1}+ \bar B\mathbf a+\mathbf w_{0:T-1}\)

Reparametrized objective: \( \mathbf s^\top \bar Q\mathbf s + \mathbf a^\top \bar R\mathbf a = \mathbf w^\top (\mathbf \Phi_s^\top \bar Q \mathbf \Phi_s  + \mathbf \Phi_a^\top \bar R \mathbf \Phi_a )\mathbf w\)

Reparametrized constraints:

\(\iff\quad\mathbf \Phi_s\mathbf w = \mathcal Z \bar A \mathbf \Phi_s\mathbf w + \mathcal Z \bar B \mathbf \Phi_a\mathbf w + \mathbf w \)

 $$\begin{bmatrix}s_0 \\ \vdots \\ s_T\end{bmatrix}= \underbrace{\begin{bmatrix}0\\ A \\ &\ddots \\&& A\end{bmatrix} }_{\mathcal Z \bar A}\begin{bmatrix}s_0 \\ \vdots \\ s_T\end{bmatrix} + \underbrace{\begin{bmatrix}0\\ B \\ &\ddots \\&& B\end{bmatrix} }_{\mathcal Z \bar B} \begin{bmatrix}a_0 \\ \vdots \\ a_{T-1}\end{bmatrix} +\begin{bmatrix} s_0\\w_0\\ \vdots \\w_{T-1}\end{bmatrix}$$

System-level Reparametrization

\(s_{t+1} = As_{t}+Ba_{t}+w_{t}\)

Reparametrized objective: \( \mathbf s^\top \bar Q\mathbf s + \mathbf a^\top \bar R\mathbf a = \mathbf w^\top (\mathbf \Phi_s^\top \bar Q \mathbf \Phi_s  + \mathbf \Phi_a^\top \bar R \mathbf \Phi_a )\mathbf w\)

Reparametrized constraints:

\(\sum_{k=0}^{t+1} \Phi_s^{t+1, k}w_{t+1-k} = A\sum_{k=0}^t \Phi_s^{t, k}w_{t-k} + B\sum_{k=0}^t \Phi_a^{t, k}w_{t-k} + w_{t}\)

(let \(w_{-1}=s_0\))

Claim: The above equality is implied by

$$\Phi_s^{t,0}=I,\quad \Phi_s^{t, k+1} = A \Phi_s^{t, k}+B\Phi_a^{t, k} \quad \forall ~t,~k\leq t $$

References: System Level Synthesis by Anderson, Doyle, Low, Matni

System Level Synthesis

Theorem: For the a linear system in feedback with a linear controller over the horizon \(t=0,\dots, T\):

  1. The affine subspace \(\{(I - \mathcal Z \bar A )\mathbf \Phi_s- \mathcal Z \bar B \mathbf \Phi_a = I\} \) parametrizes all possible system responses.
  2. For any block-lower-triangular matrices \((\mathbf \Phi_s,\mathbf \Phi_a)\) in the affine subspace, there exists a linear feedback controller achieving this response.

Example: For a 1D system with \(A=B=1\),

  • \(s_1 = (1 + K_0) s_0 + w_0\)
  • \(s_2 = (1+K_1)(1+K_0)s_0 + (1+K_1)w_0 + w_1\)

1D Example

  • Suppose \(K_0 = K_1=-\frac{1}{2}\). What are \(\mathbf \Phi_s\) and \(\mathbf \Phi_u\)?
    • \(\mathbf \Phi_s = \begin{bmatrix} 1\\ \frac{1}{2} & 1\\ \frac{1}{4} & \frac{1}{2} & 1\end{bmatrix}\) and \(\mathbf \Phi_u = \begin{bmatrix} -\frac{1}{2} \\ -\frac{1}{4}& \frac{1}{2}\end{bmatrix}\)
  • Is there some \(K_0,K_1\) such that \(\mathbf \Phi_s = \begin{bmatrix} \frac{1}{2}\\ \frac{1}{4} & \frac{1}{2}\\ \frac{1}{8} & \frac{1}{4} & \frac{1}{2}\end{bmatrix}\)?

System Level LQR

$$ \min_{\mathbf \Phi} ~~\mathbb E_w\Big[ \mathbf w^\top(\mathbf \Phi_s^\top \bar Q \mathbf \Phi_s+ \mathbf \Phi_a^\top \bar R \mathbf \Phi_a )\mathbf w \Big]\quad \text{s.t}\quad (I - \mathcal Z \bar A )\mathbf \Phi_s- \mathcal Z \bar B \mathbf \Phi_a = I $$

  • \(\mathbb E_w\Big[ \mathbf w^\top(\mathbf \Phi_s^\top \bar Q \mathbf \Phi_s+ \mathbf \Phi_a^\top \bar R \mathbf \Phi_a )\mathbf w \Big]\)
    • \(=\mathbb E_w\Big[ \mathrm{tr}((\mathbf \Phi_s^\top \bar Q \mathbf \Phi_s+ \mathbf \Phi_a^\top \bar R \mathbf \Phi_a )\mathbf w\mathbf w^\top) \Big]\)
    • \(=\sigma^2\mathrm{tr}(\mathbf \Phi_s^\top \bar Q \mathbf \Phi_s+ \mathbf \Phi_a^\top \bar R \mathbf \Phi_a )\)
    • \(=\sigma^2\left\|\begin{bmatrix}\bar Q^{1/2} \\&\bar R^{1/2}\end{bmatrix} \begin{bmatrix} \mathbf \Phi_s\\ \mathbf \Phi_a \end{bmatrix}\right\|_F^2\)

A System Level Perspective

In closed loop, state and input are linear functions of disturbance

\(x_t =  \sum_{k=0}^t A^{k}(Bu_{t-k} + w_{t-k})\)

\(u_t =  \sum_{k=0}^t K_kx_{t-k}\)

\(\begin{bmatrix} x_t\\u_t \end{bmatrix} =  \sum_{k=0}^t \begin{bmatrix} \Phi_x(t)\\ \Phi_u(t) \end{bmatrix} w_{t-k}\)

Instead of reasoning about a controller \(\mathbf{K}\), we reason about the interconnection \(\mathbf\Phi\) directly.

instead of a loop,

system looks like a line

\((A,B)\)

\(\mathbf{K}\)

\(\bf x\)

\(\bf u\)

\(\bf w\)

\(\bf x\)

\(\bf u\)

\(\bf w\)

\(\mathbf{\Phi}\)

       \(u_t = \sum_{k=0}^t{\color{Goldenrod} K_k} x_{t-k}\)

\( \underset{\mathbf u }{\min}\)   \(\displaystyle\lim_{T\to\infty}\mathbb{E}\left[ \frac{1}{T}\sum_{t=0}^T x_t^\top Q x_t + u_t^\top R u_t\right]\)

\(\text{s.t.}~~x_{t+1} = Ax_t + Bu_t + w_t\)

\(\begin{bmatrix} \mathbf x\\ \mathbf u\end{bmatrix} = \begin{bmatrix} \mathbf \Phi_x\\ \mathbf \Phi_u\end{bmatrix}\mathbf w \)

\(\text{s.t.}~~ {\color{teal}\mathbf\Phi }\in\mathrm{Affine}(A, B)\)

\( \underset{\color{teal}\mathbf{\Phi}}{\min}\)\(\left\| \begin{bmatrix} Q^{1/2} &\\& R^{1/2}\end{bmatrix} \begin{bmatrix}\color{teal} \mathbf{\Phi}_x \\ \color{teal} \mathbf{\Phi}_u \end{bmatrix} \right\|_{\mathcal{H}_2}^2\)

To implement resulting controller:

  • \(w_{-1}=s_0\) and \(a_0=\Phi_a^{0, 0}w_{-1}\)
  • for \(t=1, \dots, T\)
    • \(w_{t-1} = s_{t}-As_{t-1}-Ba_{t-1}\)
    • \(a_{t} = \sum_{k=0}^{t} \Phi_a^{t, k}w_{t-k-1}\)

System Level LQR

$$ \min_{\mathbf \Phi} ~~\left\|\begin{bmatrix}\bar Q^{1/2} \\&\bar R^{1/2}\end{bmatrix} \begin{bmatrix} \mathbf \Phi_s\\ \mathbf \Phi_a \end{bmatrix}\right\|_F^2\quad \text{s.t}\quad \begin{bmatrix} I - \mathcal Z \bar A & - \mathcal Z \bar B\end{bmatrix} \begin{bmatrix} \mathbf \Phi_s\\  \mathbf \Phi_a \end{bmatrix}= I $$

Recap: System Level LQR

       \(u_t = {\color{Goldenrod} K_t }s_{t}\)

\( \underset{\mathbf u }{\min}\)   \(\displaystyle\mathbb{E}\left[\sum_{t=0}^T s_t^\top Q s_t + a_t^\top R a_t\right]\)

\(\text{s.t.}~~s_{t+1} = As_t + Ba_t + w_t\)

\(\begin{bmatrix} \mathbf x\\ \mathbf u\end{bmatrix} = \begin{bmatrix} \mathbf \Phi_x\\ \mathbf \Phi_u\end{bmatrix}\mathbf w \)

\( \underset{\color{teal}\mathbf{\Phi}}{\min}\)\(\left\| \begin{bmatrix}\bar Q^{1/2} &\\& \bar R^{1/2}\end{bmatrix} \begin{bmatrix}\color{teal} \mathbf{\Phi}_x \\ \color{teal} \mathbf{\Phi}_u \end{bmatrix} \right\|_{F}^2\)

\(\text{s.t.}~~ \begin{bmatrix} I - \mathcal Z \bar A & - \mathcal Z \bar B\end{bmatrix} \begin{bmatrix}\color{teal} \mathbf{\Phi}_x \\ \color{teal} \mathbf{\Phi}_u \end{bmatrix}= I \)

\(B\)

\(A\)

\(s\)

\(w_t\)

\(a_t\)

\(s_t\)

\(\mathbf{K}\)

instead of a loop,


\(\mathbf{\Phi}\)

 

\(B\)

\(A\)

\(s\)

\(w_t\)

\(a_t\)

\(s_t\)

\(\mathbf{K}\)

system looks like a line

Steady State LQR

Infinite Horizon LQR Problem

$$ \min_{\pi_{0:T}} ~~\lim_{T\to\infty}\mathbb E_w\Big[\frac{1}{T}\sum_{k=0}^{T} s_k^\top Qs_k + a_k^\top Ra_k \Big]\quad \text{s.t}\quad s_{k+1} = A s_k+ Ba_k+w_k $$

Claim:  The optimal cost-to-go function is quadratic and the optimal policy is linear $$J^\star (s) = s^\top P s,\qquad \pi^\star(s) = K s$$

  • \(P = Q+A^\top PA + A^\top PB(R+B^\top PB)^{-1}B^\top PA\)
    • Discrete Algebraic Riccati Equation: \(P=\mathrm{DARE}(A,B,Q,R)\)
  • \(K = -(R+B^\top PB)^{-1}B^\top QPA\)

Steady State System Response

\(B\)

\(A\)

\(s\)

\(w_t\)

\(a_t\)

\(s_t\)

\(\mathbf{K}\)

\( = (A+BK)^{t+1} s_0 + \sum_{k=0}^{t} (A+BK)^{t-k} w_{k}\)

\( = K(A+BK)^{t+1} s_0 + \sum_{k=0}^{t} K(A+BK)^{t-k} w_{k}\)

\(s_{t+1} = As_{t}+Ba_{t}+w_{t}\)

\(a_t = K s_t\)

\(s_{t} = \Phi_s^{0} s_0 + \sum_{k=1}^t \Phi_s^{k}w_{t-k}\)

\(a_{t} = \Phi_a^{0} s_0 + \sum_{k=1}^t \Phi_a^{k}w_{t-k}\)


\(\mathbf{\Phi}\)

 

\(\begin{bmatrix} s_{0}\\\vdots \\s_T\end{bmatrix} = \begin{bmatrix} \Phi_s^{0}\\ \Phi_s^{ 1}& \Phi_s^{0}\\ \vdots  & \ddots & \ddots \\ \Phi_s^{T} & \Phi_s^{T-1} & \dots & \Phi_s^{0} \end{bmatrix} \begin{bmatrix} s_0\\w_0\\ \vdots \\w_{T-1}\end{bmatrix}\)

\(\begin{bmatrix} a_{0}\\\vdots \\a_T\end{bmatrix} = \begin{bmatrix} \Phi_a^{0}\\ \Phi_a^{1}& \Phi_a^{0}\\ \vdots  & \ddots & \ddots \\ \Phi_a^{T} & \Phi_a^{T-1} & \dots & \Phi_a^{0} \end{bmatrix} \begin{bmatrix} s_0\\w_0\\ \vdots \\w_{T-1}\end{bmatrix}\)

 

Sequences & Operators

  • Cost depends on the (semi-)infinite sequence \(\mathbf s = (s_0, s_1, s_2,\dots)\)
  • Generated by convolution between disturbance sequence \(\mathbf w = (w_{-1}, w_0, w_1,\dots)\) and (semi-)infinite operator \(\mathbf \Phi_s = (\Phi_s^0, \Phi_s^1,\dots)\)
  • We represent this convolution with the notation \(\mathbf s = \mathbf \Phi_s\mathbf w\)
  • Concretely,
    • semi-infinite vectors and Toeplitz matrices
    • frequency domain

Sequences & Operators

  • \(\mathbf s = (s_0, s_1, s_2,\dots)\), \(\mathbf w = (w_{-1}, w_0, w_1,\dots)\), and \(\mathbf \Phi_s = (\Phi_s^0, \Phi_s^1,\dots)\) $$\mathbf s = \mathbf \Phi_s\mathbf w$$
  • Concretely,
    • semi-infinite vectors and Toeplitz matrices $$\begin{bmatrix} s_{0}\\\vdots \\s_t\\\vdots \end{bmatrix} = \begin{bmatrix} \Phi_s^{0}\\ \Phi_s^{ 1}& \Phi_s^{0}\\ \vdots  & \ddots & \ddots \\ \Phi_s^{t} & \Phi_s^{t-1} & \dots & \Phi_s^{0} \\ \vdots & & \ddots &&\ddots \end{bmatrix} \begin{bmatrix} s_0\\w_0\\ \vdots \\w_{t-1} \\\vdots \end{bmatrix}$$
    • frequency domain

Sequences & Operators

  • \(\mathbf s = (s_0, s_1, s_2,\dots)\), \(\mathbf w = (w_{-1}, w_0, w_1,\dots)\), and \(\mathbf \Phi_s = (\Phi_s^0, \Phi_s^1,\dots)\) $$\mathbf s = \mathbf \Phi_s\mathbf w$$
  • Concretely,
    • semi-infinite vectors and Toeplitz matrices
    • frequency domain
      • define time shift operator \(z\) such that $$z(s_0, s_1,s_2 \dots) = (s_1, s_2,\dots)$$
      • represent \(\mathbf s(z) = \sum_{t=0}^\infty z^{-t}s_t\) and \(\mathbf \Phi_s(z) = \sum_{t=0}^\infty z^{-t}\Phi_s^t\)
      • multiplication of polynomials: $$ \mathbf \Phi_s(z) \mathbf w(z) = (\sum_{t=0}^\infty z^{-t}w_t)(\sum_{t=0}^\infty z^{-t}\Phi_s^t) = \sum_{t=0}^\infty z^{-t} \sum_{k=0}^\infty \Phi_s^k w_{t-k} $$

       \(u_t = {\color{Goldenrod} K}s_{t}\)

\( \underset{\mathbf u }{\min}\)   \(\displaystyle\lim_{T\to\infty}\mathbb{E}\left[\frac{1}{T}\sum_{t=0}^T s_t^\top Q s_t + a_t^\top R a_t\right]\)

\(\text{s.t.}~~s_{t+1} = As_t + Ba_t + w_t\)

\(\begin{bmatrix} \mathbf x\\ \mathbf u\end{bmatrix} = \begin{bmatrix} \mathbf \Phi_x\\ \mathbf \Phi_u\end{bmatrix}\mathbf w \)

\( \underset{\color{teal}\mathbf{\Phi}}{\min}\)\(\left\| \begin{bmatrix}Q^{1/2} &\\& R^{1/2}\end{bmatrix} \begin{bmatrix}\color{teal} \mathbf{\Phi}_x \\ \color{teal} \mathbf{\Phi}_u \end{bmatrix} \right\|_{\mathcal H_2}^2\)

\(\text{s.t.}~~ \begin{bmatrix} zI -  A & - B\end{bmatrix} \begin{bmatrix}\color{teal} \mathbf{\Phi}_x \\ \color{teal} \mathbf{\Phi}_u \end{bmatrix}= I \)

Infinite Horizon LQR

Exercise: Using the frequency domain notation, derive the expression for the SLS cost and constraints, where we define the  norm:

$$ \|\mathbf \Phi\|_{\mathcal H_2}^2 = \sum_{t=0}^\infty \|\Phi^t\|_F^2 $$

Recap

  • Linear quadratic regulator
    • \(\pi_t^\star(s) = K_t s\)
  • System response parametrization
    • \(\mathbf x = \mathbf \Phi_x \mathbf w, \quad\mathbf u = \mathbf \Phi_u \mathbf w\)
  • Steady-state controllers and infinite horizons
    • \(\pi^\star(s) = Ks\)

References: System Level Synthesis by Anderson, Doyle, Low, Matni and Ch 2 in Machine Learning in Feedback Systems by Sarah Dean

15 - Optimal Control - ML in Feedback Sys

By Sarah Dean

Private

15 - Optimal Control - ML in Feedback Sys