Sarah Dean PRO
asst prof in CS at Cornell
Prof Sarah Dean
\(a_t = {\color{Goldenrod} K_t }s_{t}\)
\( \underset{\mathbf a }{\min}\) \(\displaystyle\mathbb{E}\left[\sum_{t=0}^T s_t^\top Q s_t + a_t^\top R a_t\right]\)
\(\text{s.t.}~~s_{t+1} = As_t + Ba_t + w_t\)
\(\begin{bmatrix} \mathbf s\\ \mathbf a\end{bmatrix} = \begin{bmatrix} \mathbf \Phi_s\\ \mathbf \Phi_a\end{bmatrix}\mathbf w \)
\( \underset{\color{teal}\mathbf{\Phi}}{\min}\)\(\left\| \begin{bmatrix}\bar Q^{1/2} &\\& \bar R^{1/2}\end{bmatrix} \begin{bmatrix}\color{teal} \mathbf{\Phi}_s \\ \color{teal} \mathbf{\Phi}_a \end{bmatrix} \right\|_{F}^2\)
\(\text{s.t.}~~ \begin{bmatrix} I - \mathcal Z \bar A & - \mathcal Z \bar B\end{bmatrix} \begin{bmatrix}\color{teal} \mathbf{\Phi}_s\\ \color{teal} \mathbf{\Phi}_a \end{bmatrix}= I \)
\(B\)
\(A\)
\(s\)
\(w_t\)
\(a_t\)
\(s_t\)
\(\mathbf{K}\)
instead of a loop,
\(B\)
\(A\)
\(s\)
\(w_t\)
\(a_t\)
\(s_t\)
\(\mathbf{K}\)
system looks like a line
References: System Level Synthesis by Anderson, Doyle, Low, Matni
Theorem: For the a linear system in feedback with a linear controller over the horizon \(t=0,\dots, T\):
Phi_s = cvx.Variable((T*n, T*n), name="Phi_s")
Phi_a = cvx.Variable((T*p, T*n), name="Phi_a")
# Affine dynamics constraint
constr = [Phi_s[:n, :] == np.eye(n)]
for k in range(T-1):
constr.append(Phi_s[n*(k+1):n*(k+1+1),:] == A*Phi_s[n*k:n*(k+1),:] + B*Phi_a[p*k:p*(k+1),:])
constr.append(A*Phi_s[n*(T-1):,:] + B*Phi_a[p*(T-1):,:] == 0)
# Quadratic cost
cost_matrix = cvx.bmat([[Q_sqrt*Phi_s[n*k:n*(k+1), :]] for k in range(T)]
+ [[R_sqrt*Phi_a[p*k:p*(k+1), :]] for k in range(T)])
objective = cvx.norm(cost_matrix,'fro')
prob = cvx.Problem(cvx.Minimize(objective), constr)
prob.solve()
Phi_s = np.array(Phi_s.value)
Phi_a = np.array(Phi_a.value)
Infinite Horizon LQR Problem
$$ \min_{\pi} ~~\lim_{T\to\infty}\mathbb E_w\Big[\frac{1}{T}\sum_{k=0}^{T} s_k^\top Qs_k + a_k^\top Ra_k \Big]\quad \text{s.t}\quad s_{k+1} = A s_k+ Ba_k+w_k $$
Claim: The optimal cost-to-go function is quadratic and the optimal policy is linear $$J^\star (s) = s^\top P s,\qquad \pi^\star(s) = K s$$
Stochastic Infinite Horizon Optimal Control Problem
$$ \min_{\pi} ~~\lim_{t\to\infty} \mathbb E_w\Big[\frac{1}{T}\sum_{k=0}^{T} c(s_k, \pi(s_k)) \Big]\quad \text{s.t}\quad s_0~~\text{given},~~s_{k+1} = F(s_k, \pi(s_k),w_k) $$
\(\underbrace{\qquad\qquad}_{J^\pi(s_0)}\)
Bellman Optimality Equation
Reference: Ch 1 in Dynamic Programming & Optimal Control, Vol. I by Bertsekas
$$ s_{t+1} = \begin{bmatrix} 0.9 & 0.1\\ & 0.9 \end{bmatrix}s_t + \begin{bmatrix}0\\1\end{bmatrix}a_t + w_t $$
The state is position & velocity \(s=[\theta,\omega]\), input is a force \(a\in\mathbb R\).
Goal: stay near origin and be energy efficient
\(\pi_\star(s) \approx -\begin{bmatrix} 7.0\times 10^{-2}& 3.7\times 10^{-2}\end{bmatrix} s\)
\(J^\star(s) \approx s^\top \begin{bmatrix} 33.5 & 5.8 \\ 5.8 & 2.4 \end{bmatrix} s\)
\(B\)
\(A\)
\(s\)
\(w_t\)
\(a_t\)
\(s_t\)
\(\mathbf{K}\)
\( = (A+BK)^{t+1} s_0 + \sum_{k=0}^{t} (A+BK)^{t-k} w_{k}\)
\( = K(A+BK)^{t+1} s_0 + \sum_{k=0}^{t} K(A+BK)^{t-k} w_{k}\)
\(s_{t+1} = As_{t}+Ba_{t}+w_{t}\)
\(a_t = K s_t\)
\(s_{t} = \Phi_s^{0} s_0 + \sum_{k=1}^t \Phi_s^{k}w_{t-k}\)
\(a_{t} = \Phi_a^{0} s_0 + \sum_{k=1}^t \Phi_a^{k}w_{t-k}\)
\(\begin{bmatrix} s_{0}\\\vdots \\s_T\end{bmatrix} = \begin{bmatrix} \Phi_s^{0}\\ \Phi_s^{ 1}& \Phi_s^{0}\\ \vdots & \ddots & \ddots \\ \Phi_s^{T} & \Phi_s^{T-1} & \dots & \Phi_s^{0} \end{bmatrix} \begin{bmatrix} s_0\\w_0\\ \vdots \\w_{T-1}\end{bmatrix}\)
\(\begin{bmatrix} a_{0}\\\vdots \\a_T\end{bmatrix} = \begin{bmatrix} \Phi_a^{0}\\ \Phi_a^{1}& \Phi_a^{0}\\ \vdots & \ddots & \ddots \\ \Phi_a^{T} & \Phi_a^{T-1} & \dots & \Phi_a^{0} \end{bmatrix} \begin{bmatrix} s_0\\w_0\\ \vdots \\w_{T-1}\end{bmatrix}\)
$$ s_{t+1} = \begin{bmatrix} 0.9 & 0.1\\ & 0.9 \end{bmatrix}s_t + \begin{bmatrix}0\\1\end{bmatrix}a_t + w_t $$
The state is position & velocity \(s=[\theta,\omega]\), input is a force \(a\in\mathbb R\).
\(\pi_\star(s) \approx -\begin{bmatrix} 7.0\times 10^{-2}& 3.7\times 10^{-2}\end{bmatrix} s\)
\(\Phi_s^t \approx \begin{bmatrix}0.9 & 0.1 \\ -0.070 & 0.86\end{bmatrix}^{t-1} \quad \Phi_a^t \approx -\begin{bmatrix} 7.0\times 10^{-2}& 3.7\times 10^{-2}\end{bmatrix}\begin{bmatrix}0.9 & 0.1 \\ -0.070 & 0.86\end{bmatrix}^{t-1} \)
eigenvalues \(\approx 0.88\pm 0.082j\)
\(a_t = {\color{Goldenrod} K}s_{t}\)
\( \underset{\mathbf a }{\min}\) \(\displaystyle\lim_{T\to\infty}\mathbb{E}\left[\frac{1}{T}\sum_{t=0}^T s_t^\top Q s_t + a_t^\top R a_t\right]\)
\(\text{s.t.}~~s_{t+1} = As_t + Ba_t + w_t\)
\(\begin{bmatrix} \mathbf s\\ \mathbf a\end{bmatrix} = \begin{bmatrix} \mathbf \Phi_s\\ \mathbf \Phi_a\end{bmatrix}\mathbf w \)
\( \underset{\color{teal}\mathbf{\Phi}}{\min}\)\(\left\| \begin{bmatrix}Q^{1/2} &\\& R^{1/2}\end{bmatrix} \begin{bmatrix}\color{teal} \mathbf{\Phi}_s \\ \color{teal} \mathbf{\Phi}_a \end{bmatrix} \right\|_{\mathcal H_2}^2\)
\(\text{s.t.}~~ \begin{bmatrix} zI - A & - B\end{bmatrix} \begin{bmatrix}\color{teal} \mathbf{\Phi}_s \\ \color{teal} \mathbf{\Phi}_a \end{bmatrix}= I \)
Exercise: Using the frequency domain notation, derive the expression for the SLS cost and constraints. Hint: in signal notation, the dynamics can be written \(z\mathbf s = A\mathbf s + B\mathbf a + \mathbf w\)
Where we use the norm:
$$ \|\mathbf \Phi\|_{\mathcal H_2}^2 = \sum_{t=0}^\infty \|\Phi^t\|_F^2 $$
\( \underset{\color{teal}\mathbf{\Phi}}{\min}\)\(\left\| \begin{bmatrix}Q^{1/2} &\\& R^{1/2}\end{bmatrix} \begin{bmatrix}\color{teal} \mathbf{\Phi}_s \\ \color{teal} \mathbf{\Phi}_a \end{bmatrix} \right\|_{\mathcal H_2}^2~~\text{s.t.}~~ \begin{bmatrix} zI - A & - B\end{bmatrix} \begin{bmatrix}\color{teal} \mathbf{\Phi}_s \\ \color{teal} \mathbf{\Phi}_a \end{bmatrix}= I \)
policy
\(\pi_t:\mathcal S\to\mathcal A\)
observation
\(s_t\)
accumulate
\(\{(s_t, a_t, c_t)\}\)
Goal: select actions \(a_t\) to bring environment to low-cost states
action
\(a_{t}\)
\(s\)
Setting: dynamics (and cost) functions are not known, but we have data \(\{s_k, a_k, c_k\}_{k=0}^N\). Approaches include a focus on:
Setting: dynamics \(A,B\) are not known, but we have data \(\{s_k, a_k\}_{k=0}^N\)
The state is position & velocity \(s=[\theta,\omega]\), input is a force \(a\in\mathbb R\).
Goal: be energy efficient
\(\hat\pi_\star(s) \approx -\begin{bmatrix} 6.1\times 10^{-5}& 2.8\times 10^{-4}\end{bmatrix} s\) does not stabilize the system!
Even though \(\varepsilon=0.02\), \(J(\hat K)\) is infinite!
true dynamics \(\left(\begin{bmatrix} 1.01 & 0.1\\ & 1.01 \end{bmatrix}, \begin{bmatrix}0\\1\end{bmatrix}\right) \) but we estimate \(\left(\begin{bmatrix} 0.99 & 0.1\\ & 0.99 \end{bmatrix},\begin{bmatrix}0\\1\end{bmatrix}\right )\)
\( \underset{\mathbf a=\mathbf{Ks}}{\min}\) \(\underset{\|A-\widehat A\|\leq \varepsilon_A \atop \|B-\widehat B\|\leq \varepsilon_B}{\max}\) \(\mathbb{E}\left[\lim_{T\to\infty} \frac{1}{T}\sum_{t=0}^T s_t^\top Q s_t + a_t^\top R a_t\right]\)
s.t. \(s_{t+1} = As_t + Ba_t + w_t\)
Challenge: translating predictions
\(\hat s_{t+1} = \hat A\hat s_t + \hat B \hat a_t\)
to reality
\(s_{t+1} = As_t + Ba_t\)
Lemma: if the system response variables satisfy
Proof:
Therefore, the estimated cost is $$ \hat J(\hat{\mathbf \Phi}) = \left\|\begin{bmatrix} Q^{1/2}\\ & R^{1/2}\end{bmatrix} \begin{bmatrix} \hat\mathbf{\Phi}_s \\ \hat\mathbf{\Phi}_a \end{bmatrix}\right\|_{\mathcal H_2}^2 $$ while the cost actually achieved is $$ J(\hat{\mathbf \Phi}) = \left\|\begin{bmatrix} Q^{1/2}\\ & R^{1/2}\end{bmatrix} \begin{bmatrix} \hat\mathbf{\Phi}_s \\ \hat\mathbf{\Phi}_a \end{bmatrix}(I+\mathbf \Delta)^{-1} \right\|_{\mathcal H_2}^2 $$
Theorem (Anderson et al., 2019): A policy designed from systems responses satisfying \(\begin{bmatrix} zI - \hat A & - \hat B\end{bmatrix} \begin{bmatrix} \hat\mathbf{\Phi}_s \\ \hat\mathbf{\Phi}_a \end{bmatrix}= I\) will achieve response \(\begin{bmatrix} \hat\mathbf{\Phi}_s \\ \hat\mathbf{\Phi}_a \end{bmatrix} (I-\mathbf \Delta)^{-1}\)
where \(\mathbf \Delta = (\underbrace{A-\hat A}_{\Delta_A})\mathbf \Phi_s + (\underbrace{B-\hat B}_{\Delta_B})\mathbf \Phi_a\) if the inverse exists.
\( \widehat{\mathbf\Phi} = \underset{\mathbf{\Phi}, {\color{teal} \gamma}}{\arg\min}\) \(\frac{1}{1-\gamma}\)\(\left\| \begin{bmatrix} Q^{1/2} &\\& R^{1/2}\end{bmatrix} \mathbf{\Phi} \right\|_{\mathcal{H}_2}\)
\(\qquad\qquad\text{s.t.}~\begin{bmatrix}zI- \widehat A&- \widehat B\end{bmatrix} \mathbf\Phi = I\)
\(\qquad\qquad\|[{\varepsilon_A\atop ~} ~{~\atop \varepsilon_B}]\mathbf \Phi\|_{\mathcal H_\infty}\leq\gamma\)
\( \underset{\mathbf{\Phi}}{\min}\) \(\underset{\|\Delta_A\|\leq \varepsilon_A \atop \|\Delta_B\|\leq \varepsilon_B}{\max}\) \(\left\| \begin{bmatrix} Q^{1/2} &\\& R^{1/2}\end{bmatrix} \mathbf{\Phi}{\color{teal}(I-\mathbf \Delta)^{-1}} \right\|_{\mathcal{H}_2}\)
\(\text{s.t.}~ {\mathbf\Phi }\in\mathrm{Affine}(\widehat A, \widehat B)\)
\(~~~~~\color{teal} \mathbf \Delta = \begin{bmatrix}\Delta_A&\Delta_B\end{bmatrix}\mathbf{\Phi}\)
\( \underset{\mathbf a=\mathbf{Ks}}{\min}\) \(\underset{\|A-\widehat A\|\leq \varepsilon_A \atop \|B-\widehat B\|\leq \varepsilon_B}{\max}\) \(\mathbb{E}\left[\lim_{T\to\infty} \frac{1}{T}\sum_{t=0}^T s_t^\top Q s_t + a_t^\top R a_t\right]\)
s.t. \(s_{t+1} = As_t + Ba_t + w_t\)
Where we use the norm:
\(\|\mathbf \Phi\|_{\mathcal H_\infty} = \max_{\|\mathbf x\|_2\leq 1} \|\mathbf \Phi\mathbf x\|_2 \) induced by \(\|\mathbf x\|_2 =\sqrt{\sum_{t=0}^\infty \|\mathbf x_t\|_2^2} \)
Upper bounding this nonconvex objective leads to
Upper bounds follow by:
\( \underset{\mathbf{\Phi}}{\min}\) \(\underset{\|\Delta_A\|\leq \varepsilon_A \atop \|\Delta_B\|\leq \varepsilon_B}{\max}\) \(\left\| \begin{bmatrix} Q^{1/2} &\\& R^{1/2}\end{bmatrix} \mathbf{\Phi}{\color{teal}(I-\mathbf \Delta)^{-1}} \right\|_{\mathcal{H}_2}\)
\(\text{s.t.}~ {\mathbf\Phi }\in\mathrm{Affine}(\widehat A, \widehat B)\)
\(~~~~~\color{teal} \mathbf \Delta = \begin{bmatrix}\Delta_A&\Delta_B\end{bmatrix}\mathbf{\Phi}\)
\( \underset{\mathbf{\Phi}, {\color{teal} \gamma}}{\min}\) \(\frac{1}{1-\gamma}\)\(\left\| \begin{bmatrix} Q^{1/2} &\\& R^{1/2}\end{bmatrix} \mathbf{\Phi} \right\|_{\mathcal{H}_2}\)
\(\text{s.t.}~\begin{bmatrix}zI- \widehat A&- \widehat B\end{bmatrix} \mathbf\Phi = I\)
\(\|[{\varepsilon_A\atop ~} ~{~\atop \varepsilon_B}]\mathbf \Phi\|_{\mathcal H_\infty}\leq\gamma\)
\( \widehat{\mathbf\Phi} = \underset{\mathbf{\Phi}, {\color{teal} \gamma}}{\arg\min}\) \(\frac{1}{1-\gamma}\)\(\left\| \begin{bmatrix} Q^{1/2} &\\& R^{1/2}\end{bmatrix} \mathbf{\Phi} \right\|_{\mathcal{H}_2}\)
\(\qquad\qquad\text{s.t.}~\begin{bmatrix}zI- \widehat A&- \widehat B\end{bmatrix} \mathbf\Phi = I\)
\(\qquad\qquad\|[{\varepsilon_A\atop ~} ~{~\atop \varepsilon_B}]\mathbf \Phi\|_{\mathcal H_\infty}\leq\gamma\)
Informal Theorem (Suboptimality):
For \(\hat\mathbf{\Phi}\) synthesized as above and \(\mathbf\Phi_\star\) the true optimal system response,
$$ J(\hat{\mathbf \Phi}) - J(\mathbf \Phi_\star) \lesssim J(\mathbf \Phi_\star)\left\|\begin{bmatrix} \varepsilon_A & \\ & \varepsilon_B\end{bmatrix} \mathbf \Phi_\star\right\|_{\mathcal H_\infty} $$
\(J(\hat{\mathbf \Phi}) - J(\mathbf \Phi_\star) \lesssim J(\mathbf \Phi_\star)\left\|\mathbf \Phi_\star\right\|_{\mathcal H_\infty} \sqrt{\frac{m+n}{N}}\)
Using an explore then commit algorithm, we have $$R(T) = R_{\text{explore}}(N) + R_{\text{commit}}(N, T)$$
References: System Level Synthesis by Anderson, Doyle, Low, Matni and Ch 2-3 in Machine Learning in Feedback Systems by Sarah Dean
By Sarah Dean