Reliable Machine Learning in Feedback Systems

Sarah Dean

Dissertation Talk

July 27, 2021

Machine learning is a promising tool for processing complex information

velocity,

steering angle,

acceleration

velocity

\(\to\)

camera image

position

historical movie ratings

new movie rating

Dangers of static predictions in a dynamic world

Catastrophic failure
Inequality and bias
Addiction

Promises of data-driven solutions

Safety
Equity
Discovery

My approach: data-driven and robust

Ensure reliable outcomes by encoding values through lens of reachability

Where could the system go?

ML

control

optimization

data and uncertainty quantification

robust reachability condition

design a policy

Reasoning about reachability

Where the system goes

where the system is,

(trajectory)

(state)

and which actions are chosen

(policy/controller)

how the system changes,

(dynamics)

depends on:

in how the system changes

(dynamics)

in where the system is

(state)

dependent on the actions are chosen

(policy/controller)

Reasoning about uncertainty

Reliable outcomes via reachability

Safety

Discovery

system must remain in safe region

system must be able to reach many regions

Talk outline

Safety with unknown dynamics

Safety with complex observations

Discovery in recommendations

Safety with unknown dynamics

Safety with unknown dynamics

How much data do we have to collect from a system to safely control it?

Optimal control (reinforcement learning) problem

Tasks can be modeled as optimal control problems

\(\displaystyle \min_{\pi}~~ \mathrm{cost}(\pi)\)

\(\displaystyle \min_{\pi} ~~\mathrm{cost}(x_0,u_0,x_1,\dots)\)

?

\(~~~\mathrm{s.t.}~~ u_t = \pi_t(x_{0:t})\)

\(~~~~~~~~~~x_{t+1} = \mathrm{dynamics}_t(x_t,u_t, w_t)\)

\(~~~~~~~~~~x_t\in\mathcal X~~\text{for all}~ t~\text{and all}~w_t\in\mathcal W\)

state action/input

\(\mathrm{dynamics}\)

\(\pi\)

\(x_t\)

\(u_t\)

?

\(\pi_\star\)

?

disturbance

Sample complexity

Goals:

Guarantee safety
Achieve good performance
suboptimality: \(\mathrm{cost}(\widehat \pi)\) vs. \(\mathrm{cost}( \pi_\star)\)

?

How much data do we have to collect from a system to safely control it?

\(\mathrm{dynamics}\)

\(x_t\)

\(u_t\)

\(\widehat\pi\)

Example: double integrator dynamics

1. Collect data

2. Estimate model

3. Robust control

\(u_t\)

Problem setting: LQR

Linear dynamics and quadratic costs: the linear quadratic regulator (LQR)

Unknown dynamics: \(x_{t+1}=Ax_t+Bu_t+w_t\)

Cost: \(\sum_{t=0}^T x_t^\top Qx_t + u_t^\top R u_t\)

Constraints: \(F x_{t}\leq b\)

Our goal is to design a linear controller from data:

\(u_t=\widehat{\mathbf K}(x_{0:t})\)

Safe data-driven control in 3 steps

Collect data

Fit model and characterize errors

Robust control

run system for \(T\) steps

least squares estimation
\((\widehat A,\widehat B)\) and \((\varepsilon_A,\varepsilon_B)\)

synthesize \(\mathbf{\widehat K}\) via convex optimization using estimates

Main result (informal)

As long as \(T\) is large enough, with probability at least \(1-\delta\),
the system remains safe during learning and operation and
\(\mathrm{cost}(\widehat{\mathbf K}) - \mathrm{cost}({\mathbf K}_\star)\lesssim \frac{\mathrm{size~of~noise}}{\mathrm{size~of~excitation}} \sqrt{\frac{\mathrm{dim}}{T} \log(\frac{\mathrm{dim}}{\delta})} \cdot\mathrm{sensitivity}(\mathbf K_\star)\)

[D., Mania, Matni, Recht, Tu, FoCM '19; D., Tu, Matni, Recht, ACC '19]

safe excitation

Safe data-driven control in 3 steps

Main result (informal)

DMMTR, FoCM '19: First general sample complexity bound with guaranteed stability when \(\mathcal X = \mathbb R^n\)

Previous work:

Classic system identification provides asymptotic guarantees
Fiechter (1997) makes strong stability assumptions
Abbasi-Yadkori & Szepesvari (2011) study a computationally intractable adaptive method under stability assumption

DTMR, ACC '19: Sample complexity and guaranteed safety
when \(\mathcal X\) is a polytope, initial coarse estimates \(\widehat A_0\) and \(\widehat B_0\)

Ingredients:

1. Statistical learning rate

2. Robust constraint-satisfying control

3. Sub-optimality analysis

Step 1:

[D., Tu, Matni, Recht, ACC '19]

Least squares estimation:

\((\widehat A, \widehat B) \in \underset{(A,B)}{\arg\min} \sum_{t=0}^T \|Ax_t +B u_t - x_{t+1}\|^2 \)

Learning rate (error characterization):

For stabilizing linear control and large enough \(T\), we have w.p. \(1-\delta\)

\(\Big\|\begin{bmatrix} \Delta_A \\ \Delta_B\end{bmatrix}\Big\| \)\(\lesssim \frac{\mathrm{size~of~noise}}{\mathrm{size~of~excitation}} \sqrt{\frac{\mathrm{dim}}{T} \log(\frac{\mathrm{dim}}{\delta})} \)

\(u_t = {\color{goldenrod}\mathbf{K}_0}(x_{0:t})+ {\color{teal}\eta_t}\)

Collect data from system with safe excitation

Step 2:

Step 3:

process noise and excitation

\(\mathbf K\)

uncertain dynamics

\(\mathbf \Delta\)

Controller must keep system safe despite

System level synthesis

so we parametrize controller using system level synthesis (SLS) (Anderson et al., ARC 2019)

instead of a loop,

system looks like a line

\((A,B)\)

\(\mathbf{K}\)

\(\bf x\)

\(\bf u\)

\(\bf w\)

\(\bf x\)

\(\bf u\)

\(\bf w\)

\(\mathbf{\Phi}\)

\(\begin{bmatrix} \mathbf x\\ \mathbf u\end{bmatrix} = \begin{bmatrix} \mathbf \Phi_x\\ \mathbf \Phi_u\end{bmatrix}\mathbf w \)

Optimal control problem is nonconvex in \(\mathbf K\)

Robust control with SLS

\(u_t = \sum_{k=0}^t{\color{Goldenrod} K_k} x_{t-k}\)

\( \underset{\mathbf u }{\min}\) \(\displaystyle\lim_{T\to\infty}\mathbb{E}\left[ \frac{1}{T}\sum_{t=0}^T x_t^\top Q x_t + u_t^\top R u_t\right]\)

\(\text{s.t.}~~x_{t+1} = Ax_t + Bu_t + w_t\)

We parametrize controller using system level synthesis (SLS) (Anderson et al., ARC 2019)

\(x_{t} \in\mathcal X~~\forall~\|w_t\|\leq \sigma_w\)

\(\begin{bmatrix} \mathbf x\\ \mathbf u\end{bmatrix} = \begin{bmatrix} \mathbf \Phi_x\\ \mathbf \Phi_u\end{bmatrix}\mathbf w \)

Equivalent problem with quadratic costs, linear dynamics, and tightened polytope constraints

\( {\color{teal}\mathbf\Phi }\in\mathrm{Polytope}_{\sigma_w}(\mathcal X)\)

\( \underset{\color{teal}\mathbf{\Phi}}{\min}\)\(\left\| \begin{bmatrix} Q^{1/2} &\\& R^{1/2}\end{bmatrix}{\color{teal} \mathbf{\Phi}} \right\|_{\mathcal{H}_2}^2\)

\(\text{s.t.}~~ {\color{teal}\mathbf\Phi }\in\mathrm{Affine}(A, B)\)

Robust control with SLS

\(\widehat A+\)\(\Delta_A\)
\(\widehat B+\)\(\Delta_B\)

\(\mathbf{K}\)

\(\bf x\)

\(\bf u\)

\(\bf w\)

\(\begin{bmatrix} \mathbf x\\ \mathbf u\end{bmatrix} = \begin{bmatrix} \widehat{\mathbf \Phi}_x\\ \widehat{\mathbf \Phi}_u\end{bmatrix}(I+\mathbf\Delta)^{-1}\mathbf w \)

There is a closed-form translation:

SLS makes apparent the effect of estimation errors in the dynamics

\(\bf x\)

\(\bf u\)

\(\bf w\)

\(\widehat\mathbf{\Phi}\)

\(\mathbf{\Delta}\)

Robust control with SLS

\( \widehat{\mathbf\Phi} = \underset{\mathbf{\Phi}, {\color{firebrick} \gamma}}{\arg\min}\) \(\frac{1}{1-\gamma}\)\(\left\| \begin{bmatrix} Q^{1/2} &\\& R^{1/2}\end{bmatrix} \mathbf{\Phi} \right\|_{\mathcal{H}_2}^2\)

\(\qquad\qquad\text{s.t.}~ {\mathbf\Phi }\in\mathrm{Affine}(\widehat A, \widehat B)\)

\(\qquad\qquad\mathbf\Phi \in\mathrm{Polytope}_{\sigma_w,{\color{firebrick} \gamma}}(\mathcal X)\)

\(\qquad\qquad\|[{\varepsilon_A\atop ~} ~{~\atop \varepsilon_B}]\mathbf \Phi\|\leq\gamma\)

Convex optimization problem for fixed \(\gamma\)

[D., Tu, Matni, Recht, ACC '19]

Use this robust constraint-satisfying controller for collecting data (step 1) and robust control (step 3)

Robust safety and suboptimality

[D., Tu, Matni, Recht, ACC '19]

Safety and Suboptimality:

As long as the robust problem is feasible, \(\widehat{\mathbf K} = \widehat{\mathbf \Phi}_u\widehat{\mathbf \Phi}_x^{-1}\) keeps the system safe.

If \(\varepsilon_A, \varepsilon_B\) are small enough,

\(\displaystyle \frac{\mathrm{cost}(\widehat{\mathbf K}) - \mathrm{cost}({\mathbf K}_\star)}{\mathrm{cost}(\mathbf K_\star)}\lesssim \left\|\begin{bmatrix} \varepsilon_A & \\ & \varepsilon_B\end{bmatrix} \mathbf \Phi_\star\right\|_{\mathcal H_\infty} \)

Design controller with robust system level synthesis in terms of \(\widehat A,\widehat B\) and \(\varepsilon_A,\varepsilon_B\)

Safe data-driven control in 3 steps

Ingredients:

1. Statistical learning rate

2. Robust constraint-satisfying control

3. Sub-optimality analysis

Main result (informal)

Safety with unknown dynamics

Data-driven controller design:

ensures safety & optimizes performance
learns unknown linear dynamics
with finite sample guarantees

[D., Mania, Matni, Recht, Tu, NeuRIPS '18]

Method extends to adaptive setting

[D., Mania, Matni, Recht, Tu, FoCM '19]

[D., Tu, Matni, Recht, ACC '19]

Talk outline

Safety with unknown dynamics

Safety with complex observations

Discovery in recommendations

Safety with complex observations

How to remain safe with imperfect perception?

Problem setting: perception-based control

Known linear dynamics

\(x_{t+1} =Ax_t+Bu_t+w_t\)

\(z_t = q(Cx_t)\)

Complex observations (unknown appearance map)

\(z = q(Cx)\)

\( y = p(z) \)

\(y = Cx + e(x) \)

\(y_t = p(z_t)=Cx_t+e_t\)

Virtual sensor:

Output-feedback optimal control

\(y_t = Cx_t+ e_t\)

\(\min ~~\textrm{cost}(x_0, u_0, x_1,\dots)\)

\(\text{s.t.}\)

\(\pi_\star(\mathbf z) = \mathbf K p_\star(\mathbf z)\)

\(\widehat \pi (\mathbf z)= \mathbf K p(\mathbf z)\)

Suboptimality is bounded if errors are bounded

\(\mathrm{cost}(\pi_\star) - \mathrm{cost}(\widehat\pi) \leq \left\|\begin{bmatrix}\mathbf \Phi_{xe}\\ \mathbf \Phi_{ue}\end{bmatrix}\right\| \|\mathbf e\|\)

\(x_{t+1} =Ax_t+Bu_t+w_t\)

\(\mathbf K = \arg\)

The optimal controller uses a perfect perception map

The certainty equivalent controller

\((A,B,C)\)

\(\mathbf{K}\)

\(\bf y\)

\(\bf u\)

\(\bf w\)

\(\bf e\)

\(\bf x\)

\(\bf u\)

\(\begin{bmatrix} \mathbf \Phi_{xw} & \mathbf \Phi_{xe} \\ \mathbf \Phi_{uw} & \mathbf \Phi_{ue} \end{bmatrix}\)

\(=p_\star(z_t)\)

Uniform convergence

Learn perception map \(p(z)\) via nonparametric regression from uniformly sampled training data \(\{(z_i, y^\mathrm{train}_i)\}_{i=1}^T\)

data

bumps

prediction

\(z\)

Main result (informal)

As long as \(T\) is large enough, with probability at least \(1-\delta\),
\(\mathrm{cost}(\widehat\pi) - \mathrm{cost}(\pi_\star) \lesssim\) \(rL_q L_p \left(\frac{\mathsf{dim}^2\sigma^4}{T}\right)^{\frac{1}{\mathsf{dim}+4}} \left\|\begin{bmatrix} \mathbf \Phi_{xe}\\ \mathbf \Phi_{ue} \end{bmatrix}\right\|\)

Assume: