Reliable Machine Learning in Feedback Systems

Sarah Dean

Dissertation Talk

July 27, 2021

Machine learning is a promising tool for processing complex information

velocity,

steering angle,

acceleration

velocity

\(\to\)

camera image

position

historical movie ratings

new movie rating

Dangers of static predictions in a dynamic world

  • Catastrophic failure
  • Inequality and bias
  • Addiction

Promises of data-driven solutions

  • Safety
  • Equity
  • Discovery

My approach: data-driven and robust

Ensure reliable outcomes by encoding values through lens of reachability

Where could the system go?

ML

control

optimization

data and uncertainty quantification

robust reachability condition

design a policy

Reasoning about reachability

Where the system goes

where the system is,

(trajectory)

(state)

and which actions are chosen

(policy/controller)

how the system changes,

(dynamics)

depends on:

in how the system changes

(dynamics)

in where the system is

(state)

dependent on the actions are chosen

(policy/controller)

Reasoning about uncertainty

Reliable outcomes via reachability

Safety

Discovery

system must remain in safe region

system must be able to reach many regions

Talk outline

  1. Safety with unknown dynamics
  1. Safety with complex observations
  1. Discovery in recommendations
  1. Safety with unknown dynamics

Safety with unknown dynamics

How much data do we have to collect from a system to safely control it?

Optimal control (reinforcement learning) problem

Tasks can be modeled as optimal control problems

\(\displaystyle \min_{\pi}~~ \mathrm{cost}(\pi)\)

\(\displaystyle \min_{\pi} ~~\mathrm{cost}(x_0,u_0,x_1,\dots)\)

?

Ā 

\(~~~\mathrm{s.t.}~~ u_t = \pi_t(x_{0:t})\)

\(~~~~~~~~~~x_{t+1} = \mathrm{dynamics}_t(x_t,u_t, w_t)\)

\(~~~~~~~~~~x_t\in\mathcal X~~\text{for all}~ t~\text{and all}~w_t\in\mathcal W\)

state Ā  Ā  Ā Ā  Ā Ā  action/input

\(\mathrm{dynamics}\)

\(\pi\)

\(x_t\)

\(u_t\)

?

\(\pi_\star\)

?

disturbance

Sample complexity

Goals:

  1. Guarantee safety
  2. Achieve good performance
    suboptimality: \(\mathrm{cost}(\widehat \pi)\) vs. \(\mathrm{cost}( \pi_\star)\)

?

How much data do we have to collect from a system to safely control it?

\(\mathrm{dynamics}\)

\(x_t\)

\(u_t\)

\(\widehat\pi\)

Example: double integrator dynamics

1. Collect data

2. Estimate model

3. Robust control

\(u_t\)

Problem setting: LQR

Linear dynamics and quadratic costs: the linear quadratic regulator (LQR)

Unknown dynamics: Ā Ā  \(x_{t+1}=Ax_t+Bu_t+w_t\)

Cost: Ā Ā  \(\sum_{t=0}^T x_t^\top Qx_t + u_t^\top R u_t\)

Constraints: Ā Ā  \(F x_{t}\leq b\)

Our goal is to design a linear controller from data:

\(u_t=\widehat{\mathbf K}(x_{0:t})\)

Safe data-driven control in 3 steps

  1. Collect data
  1. Fit model and characterize errors
  1. Robust control

run system for \(T\) steps

least squares estimation
\((\widehat A,\widehat B)\) and \((\varepsilon_A,\varepsilon_B)\)

synthesize \(\mathbf{\widehat K}\) via convex optimization using estimates

Main result (informal)

As long as \(T\) is large enough, with probability at least \(1-\delta\),
the system remains safe during learning and operation and
\(\mathrm{cost}(\widehat{\mathbf K}) - \mathrm{cost}({\mathbf K}_\star)\lesssim \frac{\mathrm{size~of~noise}}{\mathrm{size~of~excitation}} \sqrt{\frac{\mathrm{dim}}{T} \log(\frac{\mathrm{dim}}{\delta})}Ā  \cdot\mathrm{sensitivity}(\mathbf K_\star)\)

[D., Mania, Matni, Recht, Tu, FoCM '19; D., Tu, Matni, Recht, ACC '19]

safe excitation

Safe data-driven control in 3 steps

Main result (informal)

As long as \(T\) is large enough, with probability at least \(1-\delta\),
the system remains safe during learning and operation and
\(\mathrm{cost}(\widehat{\mathbf K}) - \mathrm{cost}({\mathbf K}_\star)\lesssim \frac{\mathrm{size~of~noise}}{\mathrm{size~of~excitation}} \sqrt{\frac{\mathrm{dim}}{T} \log(\frac{\mathrm{dim}}{\delta})}Ā  \cdot\mathrm{sensitivity}(\mathbf K_\star)\)

DMMTR, FoCM '19: First general sample complexity bound with guaranteed stability when \(\mathcal X = \mathbb R^n\)

Previous work:

  • Classic system identification provides asymptotic guarantees
  • Fiechter (1997) makes strong stability assumptions
  • Abbasi-Yadkori & Szepesvari (2011) study a computationally intractable adaptive method under stability assumption

DTMR, ACC '19: Sample complexity and guaranteed safety
when \(\mathcal X\) is a polytope, initial coarse estimates \(\widehat A_0\) and \(\widehat B_0\)

Ā 

Ā 

Ingredients:

1. Statistical learning rate

2. Robust constraint-satisfying control

3. Sub-optimality analysis

Step 1:

[D., Tu, Matni, Recht, ACC '19]Ā  Ā Ā  Ā Ā 

Least squares estimation:

\((\widehat A, \widehat B) \in \underset{(A,B)}{\arg\min} \sum_{t=0}^T \|Ax_t +B u_t - x_{t+1}\|^2 \)

Learning rate (error characterization):

For stabilizing linear control and large enough \(T\), we have w.p. \(1-\delta\)

Ā 

\(\Big\|\begin{bmatrix} \Delta_A \\ \Delta_B\end{bmatrix}\Big\| \)\(\lesssim \frac{\mathrm{size~of~noise}}{\mathrm{size~of~excitation}} \sqrt{\frac{\mathrm{dim}}{T} \log(\frac{\mathrm{dim}}{\delta})}Ā  \)

\(u_t = {\color{goldenrod}\mathbf{K}_0}(x_{0:t})+ {\color{teal}\eta_t}\)

Collect data from system with safe excitation

Step 2:

Step 3:

process noise and excitation

\(\mathbf K\)

uncertain dynamics

\(\mathbf \Delta\)

Controller must keep system safe despite

System level synthesis

so we parametrize controller using system level synthesis (SLS) (Anderson et al., ARC 2019)

instead of a loop,

system looks like a line

\((A,B)\)

\(\mathbf{K}\)

\(\bf x\)

\(\bf u\)

\(\bf w\)

\(\bf x\)

\(\bf u\)

\(\bf w\)

\(\mathbf{\Phi}\)

\(\begin{bmatrix} \mathbf x\\ \mathbf u\end{bmatrix} = \begin{bmatrix} \mathbf \Phi_x\\ \mathbf \Phi_u\end{bmatrix}\mathbf w \)

Optimal control problem is nonconvex in \(\mathbf K\)

Robust control with SLS

Ā  Ā Ā  Ā  \(u_t = \sum_{k=0}^t{\color{Goldenrod} K_k} x_{t-k}\)

\( \underset{\mathbf u }{\min}\) Ā  \(\displaystyle\lim_{T\to\infty}\mathbb{E}\left[ \frac{1}{T}\sum_{t=0}^T x_t^\top Q x_t + u_t^\top R u_t\right]\)

\(\text{s.t.}~~x_{t+1} = Ax_t + Bu_t + w_t\)

We parametrize controller using system level synthesis (SLS) (Anderson et al., ARC 2019)

Ā  Ā Ā  Ā  \(x_{t} \in\mathcal X~~\forall~\|w_t\|\leq \sigma_w\)

\(\begin{bmatrix} \mathbf x\\ \mathbf u\end{bmatrix} = \begin{bmatrix} \mathbf \Phi_x\\ \mathbf \Phi_u\end{bmatrix}\mathbf w \)

Equivalent problem with quadratic costs, linear dynamics, and tightened polytope constraints

Ā  Ā  Ā Ā  \( {\color{teal}\mathbf\Phi }\in\mathrm{Polytope}_{\sigma_w}(\mathcal X)\)

\( \underset{\color{teal}\mathbf{\Phi}}{\min}\)\(\left\| \begin{bmatrix} Q^{1/2} &\\& R^{1/2}\end{bmatrix}{\color{teal} \mathbf{\Phi}} \right\|_{\mathcal{H}_2}^2\)

\(\text{s.t.}~~ {\color{teal}\mathbf\Phi }\in\mathrm{Affine}(A, B)\)

Robust control with SLS

\(\widehat A+\)\(\Delta_A\)
\(\widehat B+\)\(\Delta_B\)

\(\mathbf{K}\)

\(\bf x\)

\(\bf u\)

\(\bf w\)

\(\begin{bmatrix} \mathbf x\\ \mathbf u\end{bmatrix} = \begin{bmatrix} \widehat{\mathbf \Phi}_x\\ \widehat{\mathbf \Phi}_u\end{bmatrix}(I+\mathbf\Delta)^{-1}\mathbf w \)

There is a closed-form translation:

SLS makes apparent the effect of estimation errors in the dynamics

\(\bf x\)

\(\bf u\)

\(\bf w\)

\(\widehat\mathbf{\Phi}\)

\(\mathbf{\Delta}\)

Robust control with SLS

\( \widehat{\mathbf\Phi} = \underset{\mathbf{\Phi}, {\color{firebrick} \gamma}}{\arg\min}\) \(\frac{1}{1-\gamma}\)\(\left\| \begin{bmatrix} Q^{1/2} &\\& R^{1/2}\end{bmatrix} \mathbf{\Phi} \right\|_{\mathcal{H}_2}^2\)

\(\qquad\qquad\text{s.t.}~ {\mathbf\Phi }\in\mathrm{Affine}(\widehat A, \widehat B)\)

Ā  Ā  Ā  \(\qquad\qquad\mathbf\PhiĀ  \in\mathrm{Polytope}_{\sigma_w,{\color{firebrick} \gamma}}(\mathcal X)\)

Ā  Ā  Ā  \(\qquad\qquad\|[{\varepsilon_A\atop ~} ~{~\atop \varepsilon_B}]\mathbf \Phi\|\leq\gamma\)

Convex optimization problem for fixed \(\gamma\)

[D., Tu, Matni, Recht, ACC '19]Ā  Ā Ā  Ā Ā 

Use this robust constraint-satisfying controller for collecting data (step 1) and robust control (step 3)

Robust safety and suboptimality

[D., Tu, Matni, Recht, ACC '19]Ā  Ā Ā  Ā Ā 

Safety and Suboptimality:

As long as the robust problem is feasible, \(\widehat{\mathbf K} = \widehat{\mathbf \Phi}_u\widehat{\mathbf \Phi}_x^{-1}\) keeps the system safe.

If \(\varepsilon_A, \varepsilon_B\) are small enough,

\(\displaystyle \frac{\mathrm{cost}(\widehat{\mathbf K}) - \mathrm{cost}({\mathbf K}_\star)}{\mathrm{cost}(\mathbf K_\star)}\lesssim \left\|\begin{bmatrix} \varepsilon_A & \\ & \varepsilon_B\end{bmatrix} \mathbf \Phi_\star\right\|_{\mathcal H_\infty} \)

Design controller with robust system level synthesis in terms of \(\widehat A,\widehat B\) and \(\varepsilon_A,\varepsilon_B\)

Safe data-driven control in 3 steps

Ingredients:

1. Statistical learning rate

2. Robust constraint-satisfying control

3. Sub-optimality analysis

Main result (informal)

As long as \(T\) is large enough, with probability at least \(1-\delta\),
the system remains safe during learning and operation and
\(\mathrm{cost}(\widehat{\mathbf K}) - \mathrm{cost}({\mathbf K}_\star)\lesssim \frac{\mathrm{size~of~noise}}{\mathrm{size~of~excitation}} \sqrt{\frac{\mathrm{dim}}{T} \log(\frac{\mathrm{dim}}{\delta})}Ā  \cdot\mathrm{robustness}(\mathbf K_\star)\)

Safety with unknown dynamics

Data-driven controller design:

  • ensures safety & optimizes performance
  • learns unknown linear dynamics
  • with finite sample guarantees

[D., Mania, Matni, Recht, Tu, NeuRIPS '18]

Method extends to adaptive setting

Ā Ā  Ā  [D., Mania, Matni, Recht, Tu, FoCM '19]

[D., Tu, Matni, Recht, ACC '19]

Talk outline

  1. Safety with unknown dynamics
  1. Safety with complex observations
  1. Discovery in recommendations

Safety with complex observations

How to remain safe with imperfect perception?

Problem setting: perception-based control

Known linear dynamics

\(x_{t+1} =Ax_t+Bu_t+w_t\)

\(z_t = q(Cx_t)\)

Complex observations (unknown appearance map)

\(z = q(Cx)\)

\( y = p(z) \)

\(y = Cx + e(x) \)

\(y_t = p(z_t)=Cx_t+e_t\)

Virtual sensor:

Output-feedback optimal control

\(y_t = Cx_t+ e_t\)

\(\min ~~\textrm{cost}(x_0, u_0, x_1,\dots)\)

\(\text{s.t.}\)

\(\pi_\star(\mathbf z) = \mathbf K p_\star(\mathbf z)\)

\(\widehat \pi (\mathbf z)= \mathbf K p(\mathbf z)\)

Suboptimality is bounded if errors are bounded

\(\mathrm{cost}(\pi_\star) - \mathrm{cost}(\widehat\pi) \leq \left\|\begin{bmatrix}\mathbf \Phi_{xe}\\ \mathbf \Phi_{ue}\end{bmatrix}\right\| \|\mathbf e\|\)

\(x_{t+1} =Ax_t+Bu_t+w_t\)

\(\mathbf K = \arg\)

The optimal controller uses a perfect perception map

The certainty equivalent controller


\((A,B,C)\)
Ā 

\(\mathbf{K}\)

\(\bf y\)

\(\bf u\)

\(\bf w\)

\(\bf e\)

\(\bf x\)

\(\bf u\)

\(\begin{bmatrix} \mathbf \Phi_{xw} & \mathbf \Phi_{xe} \\ \mathbf \Phi_{uw} & \mathbf \Phi_{ue} \end{bmatrix}\)

\(=p_\star(z_t)\)

Uniform convergence

Learn perception map \(p(z)\) via nonparametric regression from uniformly sampledĀ  training data \(\{(z_i, y^\mathrm{train}_i)\}_{i=1}^T\)

data

bumps

prediction

\(z\)

Main result (informal)

As long as \(T\) is large enough, with probability at least \(1-\delta\),
\(\mathrm{cost}(\widehat\pi) - \mathrm{cost}(\pi_\star) \lesssim\) \(rL_q L_p \left(\frac{\mathsf{dim}^2\sigma^4}{T}\right)^{\frac{1}{\mathsf{dim}+4}} \left\|\begin{bmatrix} \mathbf \Phi_{xe}\\ \mathbf \Phi_{ue} \end{bmatrix}\right\|\)

Assume:

  • bounded radius of operation
  • \(p_\star\) and \(q\) are continuous

Talk outline

  1. Safety with unknown dynamics
  1. Safety with complex observations
  1. Discovery in recommendations

Feedback in automated decision systems

Fairness: equality criteria on decisions

financial status

lending decision

academic history

admission decision

[Liu, D., Simchowitz, Rolf, Hardt. ICML ā€™18]Ā  Ā  Ā 
[Rolf, Simchowitz, D., Liu, Bjorn, Hardt, Blumenstock. ICML ā€™20]Ā Ā  Ā Ā 

Wellbeing: impact of decisions

Two-step mechanism

financial status

lending decision

Compared to physical dynamics, social outcomes

  • have limited predictability
  • present difficulties of measurement
  • are of indeterminate or contested value

Optimizing a policy is ultimately a form of social control

financial status

Discovery in recommendations

Does this system enable discovery?

Discovery in recommendations

Which items can an individual discover?

Discovery in recommendations

Which items can an individual discover?

Measure discovery via reachability

[D., Rich, Recht. FAccT '20]Ā  Ā  Ā Ā 

[Curmei, D., Recht. ICML '21] Ā  Ā  Ā 

Definition: An individual can discover item \(i\) if they can take an action \(\mathbf a\) so that item \(i\) is recommended

Measure discovery via reachability

[D., Rich, Recht. FAccT '20]Ā  Ā  Ā Ā 

User \(u\) can discover item \(i\) if they can take an action \(\mathbf a\) so that item \(i\) is recommended

Convex condition as long as

  1. linear preference models

  2. top-1 selection rules

\(\exists~~\mathbf a \in \mathcal A(u) ~~\text{s.t.}~~ \mathrm{\pi}(u, \mathbf{a}) = i \)

Auditing discovery

[D., Rich, Recht. FAccT '20]Ā  Ā  Ā Ā 

Motivating questions:

  1. Does system provide discovery to new users? Old users?
  2. How is this affected by the learned preference model?

MF

top-1

rate next items

Amount of discovery

Auditing discovery

[D., Rich, Recht. FAccT '20]Ā  Ā  Ā Ā 

It is impossible

  • for an individual to discover more than a limited number of movies
  • for some movies to be recommended to any user at all

Amount of discovery

Discovery

Safety

Wellbeing

DMMRT, FoCM '19

DMMRT, NeurIPS '18

DTMR, ACC '19

DMRY, L4DC '20

DR20, arXiv '20

DTCRA, CoRL '20

TDDRYA20, arXiv '20

LDRSH, ICML '18

RSDLBHB, ICML '20

KDZGCRJ, arXiv '20

DRR, FAccT '20

ADGLZ, ISTAS '20

DDGK, FAT/ML '18

PDRW, BOE '19

DGLZ,
IEEE TTSĀ  '20

Future work: ensuring safety

Principled & robust data-driven control with guarantees

  • from complex observations
  • for nonlinear systems

online calibration for rich sensing modalities

adaptivity to friction and contact forces

Future work: ensuring discovery

  • Design principles for recommendation systems

  • Relationship to strategic behavior and markets

Future work: articulating values

Integrating data-driven automation into important domains requires ensuring safety, discovery, equity, wellbeing, and more

Many challenges in formally defining these properties as technical specifications as well as in ensuring them in dynamic and uncertain systems

Thank you for your attention!

And thanks to my advisor,

my comittee,

Ben Recht

Moritz Hardt

Francesco Borrelli

Claire Tomlin

my collaborators,

my colleagues, friends, and family

Dissertation Talk

By Sarah Dean

Dissertation Talk

  • 2,836