Data-Driven Optimal Control

ML in Feedback Sys #16

Prof Sarah Dean

Recap: System Level LQR

$a_t = {\color{Goldenrod} K_t }s_{t}$

$\underset{\mathbf a }{\min}$ $\displaystyle\mathbb{E}\left[\sum_{t=0}^T s_t^\top Q s_t + a_t^\top R a_t\right]$

$\text{s.t.}~~s_{t+1} = As_t + Ba_t + w_t$

$\begin{bmatrix} \mathbf s\\ \mathbf a\end{bmatrix} = \begin{bmatrix} \mathbf \Phi_s\\ \mathbf \Phi_a\end{bmatrix}\mathbf w$

$\underset{\color{teal}\mathbf{\Phi}}{\min}$ $\left\| \begin{bmatrix}\bar Q^{1/2} &\\& \bar R^{1/2}\end{bmatrix} \begin{bmatrix}\color{teal} \mathbf{\Phi}_s \\ \color{teal} \mathbf{\Phi}_a \end{bmatrix} \right\|_{F}^2$

$\text{s.t.}~~ \begin{bmatrix} I - \mathcal Z \bar A & - \mathcal Z \bar B\end{bmatrix} \begin{bmatrix}\color{teal} \mathbf{\Phi}_s\\ \color{teal} \mathbf{\Phi}_a \end{bmatrix}= I$

$B$

$A$

$s$

$w_t$

$a_t$

$s_t$

$\mathbf{K}$

instead of a loop,

$\mathbf{\Phi}$

$B$

$A$

$s$

$w_t$

$a_t$

$s_t$

$\mathbf{K}$

system looks like a line

References: System Level Synthesis by Anderson, Doyle, Low, Matni

System Level Synthesis

Theorem: For the a linear system in feedback with a linear controller over the horizon $t=0,\dots, T$ :

The affine subspace $\{(I - \mathcal Z \bar A )\mathbf \Phi_s- \mathcal Z \bar B \mathbf \Phi_a = I\}$ parametrizes all possible system responses.
For any block-lower-triangular matrices $(\mathbf \Phi_s,\mathbf \Phi_a)$ in the affine subspace, there exists a linear feedback controller achieving this response.

$\widehat{\mathbf\Phi} = \underset{\mathbf{\Phi}, {\color{teal} \gamma}}{\arg\min}$ $\frac{1}{1-\gamma}$ $\left\| \begin{bmatrix} Q^{1/2} &\\& R^{1/2}\end{bmatrix} \mathbf{\Phi} \right\|_{\mathcal{H}_2}$

$\qquad\qquad\text{s.t.}~\begin{bmatrix}zI- \widehat A&- \widehat B\end{bmatrix} \mathbf\Phi = I$

$\qquad\qquad\|[{\varepsilon_A\atop ~} ~{~\atop \varepsilon_B}]\mathbf \Phi\|_{\mathcal H_\infty}\leq\gamma$

Robust synthesis with SLS

$\underset{\mathbf{\Phi}}{\min}$ $\underset{\|\Delta_A\|\leq \varepsilon_A \atop \|\Delta_B\|\leq \varepsilon_B}{\max}$ $\left\| \begin{bmatrix} Q^{1/2} &\\& R^{1/2}\end{bmatrix} \mathbf{\Phi}{\color{teal}(I-\mathbf \Delta)^{-1}} \right\|_{\mathcal{H}_2}$

$\text{s.t.}~ {\mathbf\Phi }\in\mathrm{Affine}(\widehat A, \widehat B)$

$~~~~~\color{teal} \mathbf \Delta = \begin{bmatrix}\Delta_A&\Delta_B\end{bmatrix}\mathbf{\Phi}$

$\underset{\mathbf a=\mathbf{Ks}}{\min}$ $\underset{\|A-\widehat A\|\leq \varepsilon_A \atop \|B-\widehat B\|\leq \varepsilon_B}{\max}$ $\mathbb{E}\left[\lim_{T\to\infty} \frac{1}{T}\sum_{t=0}^T s_t^\top Q s_t + a_t^\top R a_t\right]$

s.t. $s_{t+1} = As_t + Ba_t + w_t$

Where we use the norm:

$\|\mathbf \Phi\|_{\mathcal H_\infty} = \max_{\|\mathbf x\|_2\leq 1} \|\mathbf \Phi\mathbf x\|_2$ induced by $\|\mathbf x\|_2 =\sqrt{\sum_{t=0}^\infty \|\mathbf x_t\|_2^2}$

Upper bounding this nonconvex objective leads to

Review of matrix norms

Euclidean norm on vectors
- $\|x\|_2 =\sqrt{\sum_{i=1}^n x_i^2} = x^\top x$
Frobenius norm: how big are matrix entries
- $\|A\|_F = \sqrt{\sum_{i=1}^n\sum_{j=1}^m A_{ij}^2} = \sqrt{\mathrm{tr}(A^\top A)}$
Operator norm: how big can this matrix make a vector
- $\|A\|_2 = \max_{\|x\|_2\leq 1} \|Ax\|_2 = \sqrt{\lambda_{\max}(A^\top A)} = \sigma_{\max}(A^\top A)$
Relationships:
- $\|A\|_2 \leq \|A\|_F$
- $\|Ax\|_2 \leq \|A\|_2\|x\|_2$
- $\|AB\|_F \leq \|A\|_2 \|B\|_F$

Signal and operator norms

$\ell_2$ norm
- $\|\mathbf x\|_2 =\sqrt{\sum_{t=0}^\infty \|\mathbf x_t\|_2^2}$
$\mathcal H_2$ norm
- $\|\mathbf \Phi\|_{\mathcal H_2} = \sqrt{\sum_{t=0}^\infty \|\Phi^t\|_F^2}$
$\mathcal H_\infty$ norm
- $\|\mathbf \Phi\|_{\mathcal H_\infty} = \max_{\|\mathbf x\|_2\leq 1} \|\mathbf \Phi\mathbf x\|_2$
Relationships:
- $\|\mathbf \Phi\|_{\mathcal H_\infty} \leq \|\mathbf \Phi\|_{\mathcal H_2}$
- $\|\mathbf \Phi\mathbf x\|_2 \leq \|\mathbf \Phi\|_{\mathcal H_\infty} \|\mathbf x\|_2$
- $\|\mathbf \Phi \mathbf \Psi\|_{\mathcal H_2} \leq \|\mathbf \Phi \|_{\mathcal H_\infty} \|\mathbf \Psi\|_{\mathcal H_2}$

Upper bounds follow by:

$\left\| \begin{bmatrix} Q^{1/2} &\\& R^{1/2}\end{bmatrix} \mathbf{\Phi}{\color{teal}(I-\mathbf \Delta)^{-1}} \right\|_{\mathcal{H}_2} \leq \left\| \begin{bmatrix} Q^{1/2} &\\& R^{1/2} \end{bmatrix}\mathbf{\Phi}\right\|_{\mathcal{H}_2} \left\| {\color{teal}(I-\mathbf \Delta)^{-1}} \right\|_{\mathcal{H}_\infty}$
$\left\| {\color{teal}(I-\mathbf \Delta)^{-1}} \right\|_{\mathcal{H}_\infty} \leq \frac{1}{1- \|\mathbf \Delta\|_{\mathcal{H}_\infty}}$
$\|\begin{bmatrix}\Delta_A&\Delta_B\end{bmatrix}\mathbf{\Phi}\|_{\mathcal{H}_\infty} \leq |[{\varepsilon_A\atop ~} ~{~\atop \varepsilon_B}]\mathbf \Phi\|_{\mathcal H_\infty}$

Robust synthesis derivation

$\text{s.t.}~ {\mathbf\Phi }\in\mathrm{Affine}(\widehat A, \widehat B)$

$~~~~~\color{teal} \mathbf \Delta = \begin{bmatrix}\Delta_A&\Delta_B\end{bmatrix}\mathbf{\Phi}$

$\underset{\mathbf{\Phi}, {\color{teal} \gamma}}{\min}$ $\frac{1}{1-\gamma}$ $\left\| \begin{bmatrix} Q^{1/2} &\\& R^{1/2}\end{bmatrix} \mathbf{\Phi} \right\|_{\mathcal{H}_2}$

$\text{s.t.}~\begin{bmatrix}zI- \widehat A&- \widehat B\end{bmatrix} \mathbf\Phi = I$

$\|[{\varepsilon_A\atop ~} ~{~\atop \varepsilon_B}]\mathbf \Phi\|_{\mathcal H_\infty}\leq\gamma$

Data-Driven Optimal Control ML in Feedback Sys #16 Prof Sarah Dean

16 - Model-Based RL - ML in Feedback Sys

By Sarah Dean

16 - Model-Based RL - ML in Feedback Sys

2 years ago

Sarah Dean PRO

asst prof in CS at Cornell

sdean.website

	Phi_s = cvx.Variable((Tn, Tn), name="Phi_s")
	Phi_a = cvx.Variable((Tp, Tn), name="Phi_a")

	# Affine dynamics constraint
	constr = [Phi_s[:n, :] == np.eye(n)]
	for k in range(T-1):
	constr.append(Phi_s[n(k+1):n(k+1+1),:] == APhi_s[nk:n(k+1),:] + BPhi_a[pk:p(k+1),:])
	constr.append(APhi_s[n(T-1):,:] + BPhi_a[p(T-1):,:] == 0)

	# Quadratic cost
	cost_matrix = cvx.bmat([[Q_sqrtPhi_s[nk:n*(k+1), :]] for k in range(T)]
	+ [[R_sqrtPhi_a[pk:p*(k+1), :]] for k in range(T)])
	objective = cvx.norm(cost_matrix,'fro')

	prob = cvx.Problem(cvx.Minimize(objective), constr)
	prob.solve()
	Phi_s = np.array(Phi_s.value)
	Phi_a = np.array(Phi_a.value)

Data-Driven Optimal Control

ML in Feedback Sys #16

Reminders

Recap: System Level LQR

$\mathbf{\Phi}$

System Level Synthesis

Policies via convex programming

Optimal Control on Arbitrary Horizons

Steady State LQR

Infinite Horizon Optimal Control

LQR Example

Steady State System Response

$\mathbf{\Phi}$

Sequences & Operators

Sequences & Operators

Sequences & Operators

LQR Example

Infinite Horizon LQR

Recap: LQR

Action in an unknown dynamic world

$?$

Data-driven Policy Design

Model-based LQR

LQR Example

Robust design is worst-case

Robust synthesis with SLS

Robust synthesis with SLS

Robust synthesis with SLS

Review of matrix norms

Signal and operator norms

Robust synthesis derivation

Robust synthesis with SLS

Model-based LQR

Online Model-based LQR

Recap

16 - Model-Based RL - ML in Feedback Sys

16 - Model-Based RL - ML in Feedback Sys

Sarah Dean PRO

Data-Driven Optimal Control

ML in Feedback Sys #16

16 - Model-Based RL - ML in Feedback Sys

More from Sarah Dean