# Reliable Machine Learning in Feedback Systems

### Sarah Dean

Dissertation Talk

July 27, 2021

## Machine learning is a promising tool for processing complex information

velocity,

steering angle,

acceleration

velocity

$$\to$$

camera image

position

historical movie ratings

new movie rating

### Dangers of static predictions in a dynamic world

• Catastrophic failure
• Inequality and bias

• Safety
• Equity
• Discovery

## My approach: data-driven and robust

Ensure reliable outcomes by encoding values through lens of reachability

Where could the system go?

### optimization

data and uncertainty quantification

robust reachability condition

design a policy

Where the system goes

where the system is,

(trajectory)

(state)

and which actions are chosen

(policy/controller)

how the system changes,

(dynamics)

depends on:

in how the system changes

(dynamics)

in where the system is

(state)

dependent on the actions are chosen

(policy/controller)

## Reliable outcomes via reachability

### Discovery

system must remain in safe region

system must be able to reach many regions

## Talk outline

1. Safety with unknown dynamics
1. Safety with complex observations
1. Discovery in recommendations
1. Safety with unknown dynamics

## Safety with unknown dynamics

How much data do we have to collect from a system to safely control it?

## Optimal control (reinforcement learning) problem

Tasks can be modeled as optimal control problems

$$\displaystyle \min_{\pi}~~ \mathrm{cost}(\pi)$$

$$\displaystyle \min_{\pi} ~~\mathrm{cost}(x_0,u_0,x_1,\dots)$$

### ?

Ā

$$~~~\mathrm{s.t.}~~ u_t = \pi_t(x_{0:t})$$

$$~~~~~~~~~~x_{t+1} = \mathrm{dynamics}_t(x_t,u_t, w_t)$$

$$~~~~~~~~~~x_t\in\mathcal X~~\text{for all}~ t~\text{and all}~w_t\in\mathcal W$$

state Ā  Ā  Ā Ā  Ā Ā  action/input

$$\mathrm{dynamics}$$

$$\pi$$

$$x_t$$

$$u_t$$

### ?

$$\pi_\star$$

disturbance

## Sample complexity

### Goals:

1. Guarantee safety
2. Achieve good performance
suboptimality: $$\mathrm{cost}(\widehat \pi)$$ vs. $$\mathrm{cost}( \pi_\star)$$

### ?

How much data do we have to collect from a system to safely control it?

$$\mathrm{dynamics}$$

$$x_t$$

$$u_t$$

$$\widehat\pi$$

## Example: double integrator dynamics

1. Collect data

2. Estimate model

3. Robust control

$$u_t$$

## Problem setting: LQR

Unknown dynamics: Ā Ā  $$x_{t+1}=Ax_t+Bu_t+w_t$$

Cost: Ā Ā  $$\sum_{t=0}^T x_t^\top Qx_t + u_t^\top R u_t$$

Constraints: Ā Ā  $$F x_{t}\leq b$$

Our goal is to design a linear controller from data:

$$u_t=\widehat{\mathbf K}(x_{0:t})$$

## Safe data-driven control in 3 steps

1. Collect data
1. Fit model and characterize errors
1. Robust control

run system for $$T$$ steps

least squares estimation
$$(\widehat A,\widehat B)$$ and $$(\varepsilon_A,\varepsilon_B)$$

synthesize $$\mathbf{\widehat K}$$ via convex optimization using estimates

### Main result (informal)

As long as $$T$$ is large enough, with probability at least $$1-\delta$$,
the system remains safe during learning and operation and
$$\mathrm{cost}(\widehat{\mathbf K}) - \mathrm{cost}({\mathbf K}_\star)\lesssim \frac{\mathrm{size~of~noise}}{\mathrm{size~of~excitation}} \sqrt{\frac{\mathrm{dim}}{T} \log(\frac{\mathrm{dim}}{\delta})}Ā \cdot\mathrm{sensitivity}(\mathbf K_\star)$$

[D., Mania, Matni, Recht, Tu, FoCM '19; D., Tu, Matni, Recht, ACC '19]

safe excitation

## Safe data-driven control in 3 steps

### Main result (informal)

As long as $$T$$ is large enough, with probability at least $$1-\delta$$,
the system remains safe during learning and operation and
$$\mathrm{cost}(\widehat{\mathbf K}) - \mathrm{cost}({\mathbf K}_\star)\lesssim \frac{\mathrm{size~of~noise}}{\mathrm{size~of~excitation}} \sqrt{\frac{\mathrm{dim}}{T} \log(\frac{\mathrm{dim}}{\delta})}Ā \cdot\mathrm{sensitivity}(\mathbf K_\star)$$

DMMTR, FoCM '19: First general sample complexity bound with guaranteed stability when $$\mathcal X = \mathbb R^n$$

Previous work:

• Classic system identification provides asymptotic guarantees
• Fiechter (1997) makes strong stability assumptions
• Abbasi-Yadkori & Szepesvari (2011) study a computationally intractable adaptive method under stability assumption

DTMR, ACC '19: Sample complexity and guaranteed safety
when $$\mathcal X$$ is a polytope, initial coarse estimates $$\widehat A_0$$ and $$\widehat B_0$$

Ā

Ā

Ingredients:

1. Statistical learning rate

2. Robust constraint-satisfying control

3. Sub-optimality analysis

## Step 1:

[D., Tu, Matni, Recht, ACC '19]Ā  Ā Ā  Ā Ā

Least squares estimation:

$$(\widehat A, \widehat B) \in \underset{(A,B)}{\arg\min} \sum_{t=0}^T \|Ax_t +B u_t - x_{t+1}\|^2$$

Learning rate (error characterization):

For stabilizing linear control and large enough $$T$$, we have w.p. $$1-\delta$$

Ā

$$\Big\|\begin{bmatrix} \Delta_A \\ \Delta_B\end{bmatrix}\Big\|$$$$\lesssim \frac{\mathrm{size~of~noise}}{\mathrm{size~of~excitation}} \sqrt{\frac{\mathrm{dim}}{T} \log(\frac{\mathrm{dim}}{\delta})}Ā$$

$$u_t = {\color{goldenrod}\mathbf{K}_0}(x_{0:t})+ {\color{teal}\eta_t}$$

Collect data from system with safe excitation

## Step 3:

### process noise and excitation

$$\mathbf K$$

### uncertain dynamics

$$\mathbf \Delta$$

## System level synthesis

so we parametrize controller using system level synthesis (SLS) (Anderson et al., ARC 2019)

system looks like a line

$$(A,B)$$

$$\mathbf{K}$$

$$\bf x$$

$$\bf u$$

$$\bf w$$

$$\bf x$$

$$\bf u$$

$$\bf w$$

# $$\mathbf{\Phi}$$

$$\begin{bmatrix} \mathbf x\\ \mathbf u\end{bmatrix} = \begin{bmatrix} \mathbf \Phi_x\\ \mathbf \Phi_u\end{bmatrix}\mathbf w$$

Optimal control problem is nonconvex in $$\mathbf K$$

## Robust control with SLS

Ā  Ā Ā  Ā  $$u_t = \sum_{k=0}^t{\color{Goldenrod} K_k} x_{t-k}$$

$$\underset{\mathbf u }{\min}$$ Ā  $$\displaystyle\lim_{T\to\infty}\mathbb{E}\left[ \frac{1}{T}\sum_{t=0}^T x_t^\top Q x_t + u_t^\top R u_t\right]$$

$$\text{s.t.}~~x_{t+1} = Ax_t + Bu_t + w_t$$

We parametrize controller using system level synthesis (SLS) (Anderson et al., ARC 2019)

Ā  Ā Ā  Ā  $$x_{t} \in\mathcal X~~\forall~\|w_t\|\leq \sigma_w$$

$$\begin{bmatrix} \mathbf x\\ \mathbf u\end{bmatrix} = \begin{bmatrix} \mathbf \Phi_x\\ \mathbf \Phi_u\end{bmatrix}\mathbf w$$

Equivalent problem with quadratic costs, linear dynamics, and tightened polytope constraints

Ā  Ā  Ā Ā  $${\color{teal}\mathbf\Phi }\in\mathrm{Polytope}_{\sigma_w}(\mathcal X)$$

$$\underset{\color{teal}\mathbf{\Phi}}{\min}$$$$\left\| \begin{bmatrix} Q^{1/2} &\\& R^{1/2}\end{bmatrix}{\color{teal} \mathbf{\Phi}} \right\|_{\mathcal{H}_2}^2$$

$$\text{s.t.}~~ {\color{teal}\mathbf\Phi }\in\mathrm{Affine}(A, B)$$

## Robust control with SLS

$$\widehat A+$$$$\Delta_A$$
$$\widehat B+$$$$\Delta_B$$

$$\mathbf{K}$$

$$\bf x$$

$$\bf u$$

$$\bf w$$

$$\begin{bmatrix} \mathbf x\\ \mathbf u\end{bmatrix} = \begin{bmatrix} \widehat{\mathbf \Phi}_x\\ \widehat{\mathbf \Phi}_u\end{bmatrix}(I+\mathbf\Delta)^{-1}\mathbf w$$

There is a closed-form translation:

SLS makes apparent the effect of estimation errors in the dynamics

$$\bf x$$

$$\bf u$$

$$\bf w$$

# $$\widehat\mathbf{\Phi}$$

$$\mathbf{\Delta}$$

## Robust control with SLS

$$\widehat{\mathbf\Phi} = \underset{\mathbf{\Phi}, {\color{firebrick} \gamma}}{\arg\min}$$ $$\frac{1}{1-\gamma}$$$$\left\| \begin{bmatrix} Q^{1/2} &\\& R^{1/2}\end{bmatrix} \mathbf{\Phi} \right\|_{\mathcal{H}_2}^2$$

$$\qquad\qquad\text{s.t.}~ {\mathbf\Phi }\in\mathrm{Affine}(\widehat A, \widehat B)$$

Ā  Ā  Ā  $$\qquad\qquad\mathbf\PhiĀ \in\mathrm{Polytope}_{\sigma_w,{\color{firebrick} \gamma}}(\mathcal X)$$

Ā  Ā  Ā  $$\qquad\qquad\|[{\varepsilon_A\atop ~} ~{~\atop \varepsilon_B}]\mathbf \Phi\|\leq\gamma$$

Convex optimization problem for fixed $$\gamma$$

[D., Tu, Matni, Recht, ACC '19]Ā  Ā Ā  Ā Ā

Use this robust constraint-satisfying controller for collecting data (step 1) and robust control (step 3)

## Robust safety and suboptimality

[D., Tu, Matni, Recht, ACC '19]Ā  Ā Ā  Ā Ā

Safety and Suboptimality:

As long as the robust problem is feasible, $$\widehat{\mathbf K} = \widehat{\mathbf \Phi}_u\widehat{\mathbf \Phi}_x^{-1}$$ keeps the system safe.

If $$\varepsilon_A, \varepsilon_B$$ are small enough,

$$\displaystyle \frac{\mathrm{cost}(\widehat{\mathbf K}) - \mathrm{cost}({\mathbf K}_\star)}{\mathrm{cost}(\mathbf K_\star)}\lesssim \left\|\begin{bmatrix} \varepsilon_A & \\ & \varepsilon_B\end{bmatrix} \mathbf \Phi_\star\right\|_{\mathcal H_\infty}$$

Design controller with robust system level synthesis in terms of $$\widehat A,\widehat B$$ and $$\varepsilon_A,\varepsilon_B$$

## Safe data-driven control in 3 steps

Ingredients:

1. Statistical learning rate

2. Robust constraint-satisfying control

3. Sub-optimality analysis

### Main result (informal)

As long as $$T$$ is large enough, with probability at least $$1-\delta$$,
the system remains safe during learning and operation and
$$\mathrm{cost}(\widehat{\mathbf K}) - \mathrm{cost}({\mathbf K}_\star)\lesssim \frac{\mathrm{size~of~noise}}{\mathrm{size~of~excitation}} \sqrt{\frac{\mathrm{dim}}{T} \log(\frac{\mathrm{dim}}{\delta})}Ā \cdot\mathrm{robustness}(\mathbf K_\star)$$

## Safety with unknown dynamics

Data-driven controller design:

• ensures safety & optimizes performance
• learns unknown linear dynamics
• with finite sample guarantees

[D., Mania, Matni, Recht, Tu, NeuRIPS '18]

Ā Ā  Ā  [D., Mania, Matni, Recht, Tu, FoCM '19]

[D., Tu, Matni, Recht, ACC '19]

## Talk outline

1. Safety with unknown dynamics
1. Safety with complex observations
1. Discovery in recommendations

## Safety with complex observations

How to remain safe with imperfect perception?

## Problem setting: perception-based control

Known linear dynamics

$$x_{t+1} =Ax_t+Bu_t+w_t$$

$$z_t = q(Cx_t)$$

Complex observations (unknown appearance map)

$$z = q(Cx)$$

$$y = p(z)$$

$$y = Cx + e(x)$$

$$y_t = p(z_t)=Cx_t+e_t$$

Virtual sensor:

## Output-feedback optimal control

$$y_t = Cx_t+ e_t$$

$$\min ~~\textrm{cost}(x_0, u_0, x_1,\dots)$$

$$\text{s.t.}$$

$$\pi_\star(\mathbf z) = \mathbf K p_\star(\mathbf z)$$

$$\widehat \pi (\mathbf z)= \mathbf K p(\mathbf z)$$

Suboptimality is bounded if errors are bounded

$$\mathrm{cost}(\pi_\star) - \mathrm{cost}(\widehat\pi) \leq \left\|\begin{bmatrix}\mathbf \Phi_{xe}\\ \mathbf \Phi_{ue}\end{bmatrix}\right\| \|\mathbf e\|$$

$$x_{t+1} =Ax_t+Bu_t+w_t$$

$$\mathbf K = \arg$$

The optimal controller uses a perfect perception map

The certainty equivalent controller

$$(A,B,C)$$
Ā

$$\mathbf{K}$$

$$\bf y$$

$$\bf u$$

$$\bf w$$

$$\bf e$$

$$\bf x$$

$$\bf u$$

### $$\begin{bmatrix} \mathbf \Phi_{xw} & \mathbf \Phi_{xe} \\ \mathbf \Phi_{uw} & \mathbf \Phi_{ue} \end{bmatrix}$$

$$=p_\star(z_t)$$

## Uniform convergence

Learn perception map $$p(z)$$ via nonparametric regression from uniformly sampledĀ  training data $$\{(z_i, y^\mathrm{train}_i)\}_{i=1}^T$$

data

bumps

prediction

$$z$$

### Main result (informal)

As long as $$T$$ is large enough, with probability at least $$1-\delta$$,
$$\mathrm{cost}(\widehat\pi) - \mathrm{cost}(\pi_\star) \lesssim$$ $$rL_q L_p \left(\frac{\mathsf{dim}^2\sigma^4}{T}\right)^{\frac{1}{\mathsf{dim}+4}} \left\|\begin{bmatrix} \mathbf \Phi_{xe}\\ \mathbf \Phi_{ue} \end{bmatrix}\right\|$$

Assume:

• $$p_\star$$ and $$q$$ are continuous

## Talk outline

1. Safety with unknown dynamics
1. Safety with complex observations
1. Discovery in recommendations

## Feedback in automated decision systems

Fairness: equality criteria on decisions

financial status

lending decision

[Liu, D., Simchowitz, Rolf, Hardt. ICML ā18]Ā  Ā  Ā
[Rolf, Simchowitz, D., Liu, Bjorn, Hardt, Blumenstock. ICML ā20]Ā Ā  Ā Ā

Wellbeing: impact of decisions

## Two-step mechanism

financial status

lending decision

Compared to physical dynamics, social outcomes

• have limited predictability
• present difficulties of measurement
• are of indeterminate or contested value

Optimizing a policy is ultimately a form of social control

financial status

## Discovery in recommendations

Does this system enable discovery?

## Discovery in recommendations

Which items can an individual discover?

## Discovery in recommendations

Which items can an individual discover?

## Measure discovery via reachability

[D., Rich, Recht. FAccT '20]Ā  Ā  Ā Ā

[Curmei, D., Recht. ICML '21] Ā  Ā  Ā

Definition: An individual can discover item $$i$$ if they can take an action $$\mathbf a$$ so that item $$i$$ is recommended

## Measure discovery via reachability

[D., Rich, Recht. FAccT '20]Ā  Ā  Ā Ā

User $$u$$ can discover item $$i$$ if they can take an action $$\mathbf a$$ so that item $$i$$ is recommended

Convex condition as long as

1. linear preference models

2. top-1 selection rules

$$\exists~~\mathbf a \in \mathcal A(u) ~~\text{s.t.}~~ \mathrm{\pi}(u, \mathbf{a}) = i$$

## Auditing discovery

[D., Rich, Recht. FAccT '20]Ā  Ā  Ā Ā

Motivating questions:

1. Does system provide discovery to new users? Old users?
2. How is this affected by the learned preference model?

MF

top-1

rate next items

Amount of discovery

## Auditing discovery

[D., Rich, Recht. FAccT '20]Ā  Ā  Ā Ā

### It is impossible

• for an individual to discover more than a limited number of movies
• for some movies to be recommended to any user at all

Amount of discovery

### Wellbeing

DMMRT, FoCM '19

DMMRT, NeurIPS '18

DTMR, ACC '19

DMRY, L4DC '20

DR20, arXiv '20

DTCRA, CoRL '20

TDDRYA20, arXiv '20

LDRSH, ICML '18

RSDLBHB, ICML '20

KDZGCRJ, arXiv '20

DRR, FAccT '20

DDGK, FAT/ML '18

PDRW, BOE '19

DGLZ,
IEEE TTSĀ  '20

## Future work: ensuring safety

Principled & robust data-driven control with guarantees

• from complex observations
• for nonlinear systems

online calibration for rich sensing modalities

adaptivity to friction and contact forces

## Future work: ensuring discovery

• Design principles for recommendation systems

• Relationship to strategic behavior and markets

## Future work: articulating values

Integrating data-driven automation into important domains requires ensuring safety, discovery, equity, wellbeing, and more

Many challenges in formally defining these properties as technical specifications as well as in ensuring them in dynamic and uncertain systems

## my comittee,

Ben Recht

Moritz Hardt

Francesco Borrelli

Claire Tomlin

By Sarah Dean

• 2,575