# Learning Enabled Control

## Goals and Motivation

• Learning theory in dynamic settings
• Getting robots (or other ML systems) to work

## Framework

1. Collect and learn from data: models and uncertainty quantification (learning theory, statistics)
2. Use learned model for control (robust synthesis)

Interplay between these steps: online learning, exploration, adaptivity

## Outline

1. Review analysis of LQR + extensions
2. Perception-based control

minimize $$\lim_{T\to\infty} \frac{1}{T} \sum_{t=0}^\infty \mathbb{E} [x_t^\top Q x_t + u_t^\top R u_t]$$

such that $$x_{t+1} = A_\star x_t + B_\star u_t + w_t$$

Classic optimal control problem

Static state feedback solution, $$u_t = K_\star x_t$$ arises from DARE($$A_\star,B_\star,Q,R$$)

## 1. Learning Dynamics

Dynamics learned by least squares:

$$\widehat A, \widehat B = \arg\min \sum_{k=0}^T \|x_{k+1} - Ax_k - Bu_k\|^2$$

$$= \arg\min \|X-Z[A~B]^\top \|_F^2$$

$$= [A_\star~B_\star]^\top + (Z^\top Z)^{-1} Z^\top W$$

$$z_k = \begin{bmatrix} x_k\\u_k\end{bmatrix}$$

Simplifying error bound relies on non-asymptotic statistics

$$\|[\widehat A-A_\star~~\widehat B-B_\star]\|_2 \leq \frac{\|Z^\top W\|_2}{\lambda_{\min}(Z^\top Z)}$$

## 2. Robust Control

The robust control problem:

minimize $$\max_{A,B} ~\lim_{T\to\infty} \frac{1}{T} \sum_{t=0}^\infty \mathbb{E} [x_t^\top Q x_t + u_t^\top R u_t]$$

such that $$x_{t+1} = A x_t + B u_t + w_t$$

and $$\|A-\widehat A\|_2\leq\varepsilon_A,~ \|B-\widehat B\|_2\leq\varepsilon_B$$

Any method for solving yields upper bounds on stability and realized performance, but we also want suboptimality compared with $$K_\star$$

## 2. Robust Control via SLS

We can map planned trajectories to realized trajectories under mismatches in dynamics

$$\begin{bmatrix} \widehat{\mathbf x}\\ \widehat{\mathbf u}\end{bmatrix} = \mathbf{\widehat{\Phi} w}$$

$$\begin{bmatrix} {\mathbf x}\\ {\mathbf u}\end{bmatrix} = \mathbf{\widehat{\Phi}}(I - \mathbf{\widehat \Delta})^{-1}\mathbf w$$

$$\bf x$$

$$\bf u$$

$$\bf w$$

# $$\mathbf{\Phi}$$

plant $$(A,B)$$

controller $$\mathbf{K}$$

$$\bf x$$

$$\bf u$$

$$\bf w$$

### LQR with SLS

robust cost

nominal achievable subspace

$$\underset{\mathbf{\Phi},\gamma}{\min}$$ $$\frac{1}{1-\gamma}$$$$\text{cost}(\mathbf{\Phi})$$

$$\text{s.t.}~\begin{bmatrix}zI- \widehat A&- \widehat B\end{bmatrix} \mathbf\Phi = I$$

$$\|[{\varepsilon_A\atop ~} ~{~\atop \varepsilon_B}]\mathbf \Phi\|_{H_\infty}\leq\gamma$$

sensitivity constraints

achievable subspace

minimize  $$\mathbb{E}[$$cost$$(x_0,u_0,x_1...)]$$

s.t.  $$x_{t+1} = Ax_t + Bu_t + w_t$$

minimize  cost($$\mathbf{\Phi}$$)

s.t. $$\begin{bmatrix}zI- A&- B\end{bmatrix} \mathbf\Phi = I$$

### LQR with System Level Synthesis

achievable subspace

minimize  $$\mathbb{E}[$$cost$$(x_0,u_0,x_1...)]$$

s.t.  $$x_{t+1} = Ax_t + Bu_t + w_t$$

minimize  cost($$\mathbf{\Phi}$$)

s.t. $$\begin{bmatrix}zI- A&- B\end{bmatrix} \mathbf\Phi = I$$

Instead of reasoning about a controller $$\mathbf{K}$$,

we reason about the interconnection $$\mathbf\Phi$$ directly

$$\bf x$$

$$\bf u$$

$$\bf w$$

# $$\mathbf{\Phi}$$

plant $$(A,B)$$

controller $$\mathbf{K}$$

$$\bf x$$

$$\bf u$$

$$\bf w$$

As long as $$T$$ is large enough, then w.p. $$1-\delta$$,
subopt. of $$\widehat{\mathbf K}\lesssim \frac{\sigma_w C_u}{\sigma_u} \sqrt{\frac{(n+d)\log(1/\delta)}{T}} \|$$CL$$(A_\star,B_\star, K_*)\|_{H_\infty}$$

## LQR Sample Complexity

Extensions

1. Online robust control: decaying excitation over time
2. Constrained LQR: polytopic constraints
3. Online CE control: no need for robustness

## Perception-Based Control

Difficulty making use of complex sensing modalities

Example: cameras on drones

• Physics-based state estimation with EKF or UKF
• Careful control design and tuning

End-To-End Approach

• Deep networks map images to actions

## Optimal Control Perspective

minimize $$\mathrm{cost}(\mathbf x, \mathbf u)$$

such that $$x_{k+1} = f(x_k, u_k, w_k),~~z_k = g(x_k,\nu_k),~~u_k = \gamma_k(z_{0:k})$$

minimize $$\mathrm{cost}(\widehat{\mathbf x}, \mathbf u)$$

such that $$\widehat x_{k+1} = f(\widehat x_k, u_k, w_k), \\ u_k = \gamma(\widehat x_k)$$

End-to-end:

minimize $$\mathrm{cost}$$

such that $$u_k = \gamma_k(z_{0:k})$$

Our Approach: Use perception map

minimize $$\mathrm{cost}(\mathbf x, \mathbf u)$$

such that $$x_{k+1} = f(x_k, u_k, w_k),~~z_k = g(x_k,\nu_k),$$

$$u_k = \gamma_k(y_{0:k}),~~y_k=p(z_k)\approx Cx_k$$

## Problem Setting

• Linear dynamics: $$x_{k+1} = Ax_k + B u_k + Hw_k$$
• State-dependent observations: $$z_k \approx q(x_k)$$
• Perception map acts as virtual sensor, $$y_k = p(z_k) = C x_k + e_k$$

## 1. Learning Perception Map

We need to fit some $$p$$ and then guarantee something useful about errors $$e=Cx-p(z)$$.

Robust regression:

minimize $$\varepsilon_e + \lambda R(p)$$

such that $$\|p(z_k) - x_k\|\leq \varepsilon_e$$

## 1. Learning Perception Map

More generally, the errors might be better characterized as

$$p(z_k) - Cx_k = \Delta(x_k) + n_k = \Delta_{C,k} x_k + n_k$$

where $$\Delta_C$$ and $$n$$ are norm bounded.

Regressing error profile:

minimize $$M\varepsilon_C +\varepsilon_n$$

such that $$\|p(z_k) - x_k\|\leq \varepsilon_C\|x_k\| + \varepsilon_n$$

## 2. Output Feedback Control

If we know that $$\|p(z_k) - Cx_k\|= \|e_k\|\leq \varepsilon_e$$, we can apply our favorite robust control cost

The SLS problem looks like:

minimize $$\Big \| \begin{bmatrix}Q^{1/2} &\\&R^{1/2}\end{bmatrix} \mathbf{\Phi} \begin{bmatrix}\sigma_w H \\ \varepsilon_e I\end{bmatrix} \Big \|$$

such that $$\mathbf{\Phi} \in \mathrm{affine}(A,B,C)$$

## 2. Robust Output Feedback Control

Instead: $$p(z_k) - x_k=\Delta_{C,k}x_k + n_k$$ for $$\|\Delta_C\| \leq \varepsilon_C$$ and $$\|n\| \leq \varepsilon_n$$

Adapt the previous approach to handle uncertainty in the sensing matrix.

The robust SLS problem looks like:

minimize $$\Big \| \begin{bmatrix}Q^{1/2} &\\&R^{1/2}\end{bmatrix} \begin{bmatrix}\mathbf{\Phi_{xw}} & \mathbf{\Phi_{xn}} \\ \mathbf{\Phi_{uw}} & \mathbf{\Phi_{un}} \end{bmatrix} \begin{bmatrix}\sigma_w H \\ \varepsilon_n I\end{bmatrix} \Big \|$$

$$+ \frac{\varepsilon_C(\gamma\varepsilon_w + \tau\varepsilon_n)}{1-\tau\varepsilon_C}\Big \| \begin{bmatrix}Q^{1/2} \\R^{1/2}\end{bmatrix} \begin{bmatrix} \mathbf{\Phi_{xn}} \\ \mathbf{\Phi_{un}} \end{bmatrix} \Big \|$$

such that $$\mathbf{\Phi} \in \mathrm{affine}(A,B,C)$$, $$\|\mathbf{\Phi_{xw}}H\|\leq \gamma$$, $$\|\mathbf{\Phi_{xn}}\|\leq \tau$$

## Main Result

We can synthesize a controller $$\mathbf \Phi$$ with performance

cost$$(\mathbf \Phi)\leq \Big \| \begin{bmatrix}Q^{1/2} &\\&R^{1/2}\end{bmatrix} \mathbf{\Phi} \begin{bmatrix}\sigma_w H \\ (\widehat\varepsilon_e + \varepsilon_\mathrm{gen}) I\end{bmatrix} \Big \|$$

where $$\varepsilon_\mathrm{gen}$$ depends on smoothness and robustness of the perception map, bounded closed-loop response to errors $$\mathbf\Phi_{xe}$$, and the planned trajectory's distance to training points

## Generalization

If we learn some perception map $$p$$ and error bound $$\widehat \varepsilon_e$$ on training data, what can we say about the performance of $$p$$ during operation?

Non-parametric guarantees from learning theory on risk $$\mathcal R$$

$$\mathcal R(p) = \mathcal R_N(p) +\mathcal R(p)- \mathcal R_N(p) \leq \mathcal R_N(p) +\varepsilon_\mathrm{gen}$$

e.g. for classic ERM, $$\mathbb{E}_{\mathcal D}[\ell(p;x,z)] \leq \mathbb{E}_{\mathcal D_N}[\ell(p;x,z)] + \mathcal{O}(\sqrt{\frac{1}{N}})$$

This usual statistical generalization argument relies on $$\mathcal D_N$$ being drawn from $$\mathcal D$$.

## Distribution shift

Our setting looks more like

$$\mathbb{E}_{\mathcal D}[\ell(p;x,z)] = \mathbb{E}_{\mathcal D_N'}[\ell(p;x,z)]$$

$$+(\mathbb{E}_{\mathcal D_N}[\ell(p;x,z)]- \mathbb{E}_{\mathcal D_N'}[\ell(p;x,z)] )$$

$$+ (\mathbb{E}_{\mathcal D}[\ell(p;x,z)] - \mathbb{E}_{\mathcal D_N}[\ell(p;x,z)] )$$

$$\leq \mathbb{E}_{\mathcal D_N'}[\ell(p;x,z)] +\varepsilon_\mathrm{shift}\rho(\mathcal{D,D'})+ \varepsilon_N$$

But our training and closed-loop distributions will be different, especially since $$\mathcal D$$ depends on the errors $$\mathbf e$$ themselves,

$$\mathbf x = \mathbf{\Phi_w w + \Phi_e e}$$

## Generalization Argument

High level generalization argument

1. Closeness implies generalization: if $$\rho(\mathcal D,\mathcal D')$$ bounded, then $$\mathcal R(p)$$ bounded.
2. Generalization implies closeness: if $$\mathcal R(p)$$ bounded, then $$\rho(\mathcal D,\mathcal D')$$ bounded

But our training and closed-loop distributions will be different, especially since $$\mathcal D$$ depends on the errors $$\mathbf e$$ themselves,

$$\mathbf x = \mathbf{\Phi_w w + \Phi_e e}$$

## Generalization Argument

1. Closeness implies generalization (statistical, structural):
if $$\rho(\mathcal D,\mathcal D')\leq r$$, then $$\mathcal R(p) \leq \mathcal R_N(p) +\varepsilon_N +\varepsilon_\mathrm{shift}\rho(\mathcal D,\mathcal D')$$
2. Generalization implies closeness (control, planning):
$$\rho(\mathcal D,\mathcal D') \leq \rho_0 +\varepsilon_\mathrm{rob}\mathcal R(p)$$

From this, we can bound $$\mathcal R(p)$$  and $$\rho(\mathcal{D,D'})$$ as long as $$\varepsilon_\mathrm{shift}\varepsilon_\mathrm{rob}\leq 1$$

$$\implies$$   guarantees on performance and stability.

## Non-statistical assumptions

We take a robust and adversarial view,

training distribution $$\mathcal D'$$ specified by points $$\{(x_d,z_d)\}$$, "testing" distribution $$\mathcal D$$ specified by trajectory $$\mathbf x, \mathbf z$$

Then $$\mathcal{R}(p) = \|C\mathbf x - p(\mathbf z)\|$$, $$\mathcal{R}_N(p) = \max_k \|Cx_k - p(z_k)\|$$, and $$\rho(\mathcal{D,D'}) = \min_{\mathbf x_d}\|\mathbf{x-x_d}\|$$

In lieu of distributional assumptions, we assume smoothness.

• $$z_k = T_k(q(x_k),x_k)$$ for nuisance transformations $$T_k$$
• e.g. sensor noise, lighting, scene changes
• $$\|p(z)-p(T(z,x))\|\leq \varepsilon_p$$ for all  $$T$$ and $$z=q(x)$$
• $$p(q(x)) - Cx$$ is locally $$L_e$$ Lipschitz with a radius of $$r$$
• e.g. $$L_e \leq L_pL_q+\|C\|$$

## Non-statistical generalization

1. Determine set with bounded generalization error $$\varepsilon_\mathrm{gen}$$
• $$\min_{x_d}\|\mathbf{x-x_d}\|\leq \frac{\varepsilon_\mathrm{gen} - 2\varepsilon_p}{L_e} \leq r$$
2. Design controller to remain within set:
• constraint $$\|\mathbf{ \Phi_{xw}w-x_d}\|+\|\mathbf{ \Phi_{xe}}\|(\hat \varepsilon_e + \varepsilon_\mathrm{gen}) \leq \frac{\varepsilon_\mathrm{gen} - 2\varepsilon_p}{L_e}$$

This is possible as long as $$L_e\|\mathbf{ \Phi_{xe}}\|\leq 1$$

## Controller Design

Design controller to remain within set:

• constraint $$\|\mathbf{ \Phi_{xw}}H\mathbf{w-x_d}\|+\|\mathbf{ \Phi_{xe}}\|(\hat \varepsilon_e + \varepsilon_\mathrm{gen}) \leq \frac{\varepsilon_\mathrm{gen} - 2\varepsilon_p}{L_e}$$

For example, with reference tracking:

$$\|$$$$\mathbf{x_\mathrm{ref}}$$$$-\mathbf{x_d}\|+\|\mathbf{ \Phi_{xw}}H\|\varepsilon_\mathrm{ref}$$

Controller design balances responsiveness to references $$\|\mathbf{ \Phi_{xw}}H\|$$ with sensor response $$\|\mathbf{ \Phi_{xe}}\|$$

## Main Result

We can synthesize a controller $$\mathbf \Phi$$ with performance

cost$$(\mathbf \Phi)\leq \Big \| \begin{bmatrix}Q^{1/2} &\\&R^{1/2}\end{bmatrix} \mathbf{\Phi} \begin{bmatrix}\sigma_w H \\ (\widehat\varepsilon_e + \varepsilon_\mathrm{gen}) I\end{bmatrix} \Big \|$$

where $$\varepsilon_\mathrm{gen}$$ depends on smoothness $$L_e$$ and robustness $$\varepsilon_p$$ of the perception map, bounded closed-loop response to errors $$\mathbf\Phi_{xe}$$, and the planned trajectory's distance to training points $$\rho_0$$.

$$\varepsilon_\mathrm{gen} = \frac{2\varepsilon_p+L_e\rho_0}{1-L_e\|\mathbf{\Phi_{xe}}\|}$$