Reliable Machine Learning in Feedback Systems
Sarah Dean
Dissertation Talk
July 27, 2021
Machine learning is a promising tool for processing complex information
velocity,
steering angle,
acceleration
velocity
\(\to\)
camera image
position
historical movie ratings
new movie rating
Dangers of static predictions in a dynamic world
 Catastrophic failure
 Inequality and bias
 Addiction
Promises of datadriven solutions
 Safety
 Equity
 Discovery
My approach: datadriven and robust
Ensure reliable outcomes by encoding values through lens of reachability
Where could the system go?
ML
control
optimization
data and uncertainty quantification
robust reachability condition
design a policy
Reasoning about reachability
Where the system goes
where the system is,
(trajectory)
(state)
and which actions are chosen
(policy/controller)
how the system changes,
(dynamics)
depends on:
in how the system changes
(dynamics)
in where the system is
(state)
dependent on the actions are chosen
(policy/controller)
Reasoning about uncertainty
Reliable outcomes via reachability
Safety
Discovery
system must remain in safe region
system must be able to reach many regions
Talk outline
 Safety with unknown dynamics
 Safety with complex observations
 Discovery in recommendations
 Safety with unknown dynamics
Safety with unknown dynamics
How much data do we have to collect from a system to safely control it?
Optimal control (reinforcement learning) problem
Tasks can be modeled as optimal control problems
\(\displaystyle \min_{\pi}~~ \mathrm{cost}(\pi)\)
\(\displaystyle \min_{\pi} ~~\mathrm{cost}(x_0,u_0,x_1,\dots)\)
?
Ā
\(~~~\mathrm{s.t.}~~ u_t = \pi_t(x_{0:t})\)
\(~~~~~~~~~~x_{t+1} = \mathrm{dynamics}_t(x_t,u_t, w_t)\)
\(~~~~~~~~~~x_t\in\mathcal X~~\text{for all}~ t~\text{and all}~w_t\in\mathcal W\)
state Ā Ā Ā Ā Ā Ā action/input
\(\mathrm{dynamics}\)
\(\pi\)
\(x_t\)
\(u_t\)
?
\(\pi_\star\)
?
disturbance
Sample complexity
Goals:
 Guarantee safety
 Achieve good performance
suboptimality: \(\mathrm{cost}(\widehat \pi)\) vs. \(\mathrm{cost}( \pi_\star)\)
?
How much data do we have to collect from a system to safely control it?
\(\mathrm{dynamics}\)
\(x_t\)
\(u_t\)
\(\widehat\pi\)
Example: double integrator dynamics
1. Collect data
2. Estimate model
3. Robust control
\(u_t\)
Problem setting: LQR
Linear dynamics and quadratic costs: the linear quadratic regulator (LQR)
Unknown dynamics: Ā Ā \(x_{t+1}=Ax_t+Bu_t+w_t\)
Cost: Ā Ā \(\sum_{t=0}^T x_t^\top Qx_t + u_t^\top R u_t\)
Constraints: Ā Ā \(F x_{t}\leq b\)
Our goal is to design a linear controller from data:
\(u_t=\widehat{\mathbf K}(x_{0:t})\)
Safe datadriven control in 3 steps
 Collect data
 Fit model and characterize errors
 Robust control
run system for \(T\) steps
least squares estimation
\((\widehat A,\widehat B)\) and \((\varepsilon_A,\varepsilon_B)\)
synthesize \(\mathbf{\widehat K}\) via convex optimization using estimates
Main result (informal)
As long as \(T\) is large enough, with probability at least \(1\delta\),
the system remains safe during learning and operation and
\(\mathrm{cost}(\widehat{\mathbf K})  \mathrm{cost}({\mathbf K}_\star)\lesssim \frac{\mathrm{size~of~noise}}{\mathrm{size~of~excitation}} \sqrt{\frac{\mathrm{dim}}{T} \log(\frac{\mathrm{dim}}{\delta})}Ā \cdot\mathrm{sensitivity}(\mathbf K_\star)\)
[D., Mania, Matni, Recht, Tu, FoCM '19; D., Tu, Matni, Recht, ACC '19]
safe excitation
Safe datadriven control in 3 steps
Main result (informal)
As long as \(T\) is large enough, with probability at least \(1\delta\),
the system remains safe during learning and operation and
\(\mathrm{cost}(\widehat{\mathbf K})  \mathrm{cost}({\mathbf K}_\star)\lesssim \frac{\mathrm{size~of~noise}}{\mathrm{size~of~excitation}} \sqrt{\frac{\mathrm{dim}}{T} \log(\frac{\mathrm{dim}}{\delta})}Ā \cdot\mathrm{sensitivity}(\mathbf K_\star)\)
DMMTR, FoCM '19: First general sample complexity bound with guaranteed stability when \(\mathcal X = \mathbb R^n\)
Previous work:
 Classic system identification provides asymptotic guarantees
 Fiechter (1997) makes strong stability assumptions
 AbbasiYadkori & Szepesvari (2011) study a computationally intractable adaptive method under stability assumption
DTMR, ACC '19: Sample complexity and guaranteed safety
when \(\mathcal X\) is a polytope, initial coarse estimates \(\widehat A_0\) and \(\widehat B_0\)
Ā
Ā
Ingredients:
1. Statistical learning rate
2. Robust constraintsatisfying control
3. Suboptimality analysis
Step 1:
[D., Tu, Matni, Recht, ACC '19]Ā Ā Ā Ā Ā
Least squares estimation:
\((\widehat A, \widehat B) \in \underset{(A,B)}{\arg\min} \sum_{t=0}^T \Ax_t +B u_t  x_{t+1}\^2 \)
Learning rate (error characterization):
For stabilizing linear control and large enough \(T\), we have w.p. \(1\delta\)
Ā
\(\Big\\begin{bmatrix} \Delta_A \\ \Delta_B\end{bmatrix}\Big\ \)\(\lesssim \frac{\mathrm{size~of~noise}}{\mathrm{size~of~excitation}} \sqrt{\frac{\mathrm{dim}}{T} \log(\frac{\mathrm{dim}}{\delta})}Ā \)
\(u_t = {\color{goldenrod}\mathbf{K}_0}(x_{0:t})+ {\color{teal}\eta_t}\)
Collect data from system with safe excitation
Step 2:
Step 3:
process noise and excitation
\(\mathbf K\)
uncertain dynamics
\(\mathbf \Delta\)
Controller must keep system safe despite
System level synthesis
so we parametrize controller using system level synthesis (SLS) (Anderson et al., ARC 2019)
instead of a loop,
system looks like a line
\((A,B)\)
\(\mathbf{K}\)
\(\bf x\)
\(\bf u\)
\(\bf w\)
\(\bf x\)
\(\bf u\)
\(\bf w\)
\(\mathbf{\Phi}\)
\(\begin{bmatrix} \mathbf x\\ \mathbf u\end{bmatrix} = \begin{bmatrix} \mathbf \Phi_x\\ \mathbf \Phi_u\end{bmatrix}\mathbf w \)
Optimal control problem is nonconvex in \(\mathbf K\)
Robust control with SLS
Ā Ā Ā Ā \(u_t = \sum_{k=0}^t{\color{Goldenrod} K_k} x_{tk}\)
\( \underset{\mathbf u }{\min}\) Ā \(\displaystyle\lim_{T\to\infty}\mathbb{E}\left[ \frac{1}{T}\sum_{t=0}^T x_t^\top Q x_t + u_t^\top R u_t\right]\)
\(\text{s.t.}~~x_{t+1} = Ax_t + Bu_t + w_t\)
We parametrize controller using system level synthesis (SLS) (Anderson et al., ARC 2019)
Ā Ā Ā Ā \(x_{t} \in\mathcal X~~\forall~\w_t\\leq \sigma_w\)
\(\begin{bmatrix} \mathbf x\\ \mathbf u\end{bmatrix} = \begin{bmatrix} \mathbf \Phi_x\\ \mathbf \Phi_u\end{bmatrix}\mathbf w \)
Equivalent problem with quadratic costs, linear dynamics, and tightened polytope constraints
Ā Ā Ā Ā \( {\color{teal}\mathbf\Phi }\in\mathrm{Polytope}_{\sigma_w}(\mathcal X)\)
\( \underset{\color{teal}\mathbf{\Phi}}{\min}\)\(\left\ \begin{bmatrix} Q^{1/2} &\\& R^{1/2}\end{bmatrix}{\color{teal} \mathbf{\Phi}} \right\_{\mathcal{H}_2}^2\)
\(\text{s.t.}~~ {\color{teal}\mathbf\Phi }\in\mathrm{Affine}(A, B)\)
Robust control with SLS
\(\widehat A+\)\(\Delta_A\)
\(\widehat B+\)\(\Delta_B\)
\(\mathbf{K}\)
\(\bf x\)
\(\bf u\)
\(\bf w\)
\(\begin{bmatrix} \mathbf x\\ \mathbf u\end{bmatrix} = \begin{bmatrix} \widehat{\mathbf \Phi}_x\\ \widehat{\mathbf \Phi}_u\end{bmatrix}(I+\mathbf\Delta)^{1}\mathbf w \)
There is a closedform translation:
SLS makes apparent the effect of estimation errors in the dynamics
\(\bf x\)
\(\bf u\)
\(\bf w\)
\(\widehat\mathbf{\Phi}\)
\(\mathbf{\Delta}\)
Robust control with SLS
\( \widehat{\mathbf\Phi} = \underset{\mathbf{\Phi}, {\color{firebrick} \gamma}}{\arg\min}\) \(\frac{1}{1\gamma}\)\(\left\ \begin{bmatrix} Q^{1/2} &\\& R^{1/2}\end{bmatrix} \mathbf{\Phi} \right\_{\mathcal{H}_2}^2\)
\(\qquad\qquad\text{s.t.}~ {\mathbf\Phi }\in\mathrm{Affine}(\widehat A, \widehat B)\)
Ā Ā Ā \(\qquad\qquad\mathbf\PhiĀ \in\mathrm{Polytope}_{\sigma_w,{\color{firebrick} \gamma}}(\mathcal X)\)
Ā Ā Ā \(\qquad\qquad\[{\varepsilon_A\atop ~} ~{~\atop \varepsilon_B}]\mathbf \Phi\\leq\gamma\)
Convex optimization problem for fixed \(\gamma\)
[D., Tu, Matni, Recht, ACC '19]Ā Ā Ā Ā Ā
Use this robust constraintsatisfying controller for collecting data (step 1) and robust control (step 3)
Robust safety and suboptimality
[D., Tu, Matni, Recht, ACC '19]Ā Ā Ā Ā Ā
Safety and Suboptimality:
As long as the robust problem is feasible, \(\widehat{\mathbf K} = \widehat{\mathbf \Phi}_u\widehat{\mathbf \Phi}_x^{1}\) keeps the system safe.
If \(\varepsilon_A, \varepsilon_B\) are small enough,
\(\displaystyle \frac{\mathrm{cost}(\widehat{\mathbf K})  \mathrm{cost}({\mathbf K}_\star)}{\mathrm{cost}(\mathbf K_\star)}\lesssim \left\\begin{bmatrix} \varepsilon_A & \\ & \varepsilon_B\end{bmatrix} \mathbf \Phi_\star\right\_{\mathcal H_\infty} \)
Design controller with robust system level synthesis in terms of \(\widehat A,\widehat B\) and \(\varepsilon_A,\varepsilon_B\)
Safe datadriven control in 3 steps
Ingredients:
1. Statistical learning rate
2. Robust constraintsatisfying control
3. Suboptimality analysis
Main result (informal)
As long as \(T\) is large enough, with probability at least \(1\delta\),
the system remains safe during learning and operation and
\(\mathrm{cost}(\widehat{\mathbf K})  \mathrm{cost}({\mathbf K}_\star)\lesssim \frac{\mathrm{size~of~noise}}{\mathrm{size~of~excitation}} \sqrt{\frac{\mathrm{dim}}{T} \log(\frac{\mathrm{dim}}{\delta})}Ā \cdot\mathrm{robustness}(\mathbf K_\star)\)
Safety with unknown dynamics
Datadriven controller design:
 ensures safety & optimizes performance
 learns unknown linear dynamics
 with finite sample guarantees
[D., Mania, Matni, Recht, Tu, NeuRIPS '18]
Method extends to adaptive setting
Ā Ā Ā [D., Mania, Matni, Recht, Tu, FoCM '19]
[D., Tu, Matni, Recht, ACC '19]
Talk outline
 Safety with unknown dynamics
 Safety with complex observations
 Discovery in recommendations
Safety with complex observations
How to remain safe with imperfect perception?
Problem setting: perceptionbased control
Known linear dynamics
\(x_{t+1} =Ax_t+Bu_t+w_t\)
\(z_t = q(Cx_t)\)
Complex observations (unknown appearance map)
\(z = q(Cx)\)
\( y = p(z) \)
\(y = Cx + e(x) \)
\(y_t = p(z_t)=Cx_t+e_t\)
Virtual sensor:
Outputfeedback optimal control
\(y_t = Cx_t+ e_t\)
\(\min ~~\textrm{cost}(x_0, u_0, x_1,\dots)\)
\(\text{s.t.}\)
\(\pi_\star(\mathbf z) = \mathbf K p_\star(\mathbf z)\)
\(\widehat \pi (\mathbf z)= \mathbf K p(\mathbf z)\)
Suboptimality is bounded if errors are bounded
\(\mathrm{cost}(\pi_\star)  \mathrm{cost}(\widehat\pi) \leq \left\\begin{bmatrix}\mathbf \Phi_{xe}\\ \mathbf \Phi_{ue}\end{bmatrix}\right\ \\mathbf e\\)
\(x_{t+1} =Ax_t+Bu_t+w_t\)
\(\mathbf K = \arg\)
The optimal controller uses a perfect perception map
The certainty equivalent controller
\((A,B,C)\)
Ā
\(\mathbf{K}\)
\(\bf y\)
\(\bf u\)
\(\bf w\)
\(\bf e\)
\(\bf x\)
\(\bf u\)
\(\begin{bmatrix} \mathbf \Phi_{xw} & \mathbf \Phi_{xe} \\ \mathbf \Phi_{uw} & \mathbf \Phi_{ue} \end{bmatrix}\)
\(=p_\star(z_t)\)
Uniform convergence
Learn perception map \(p(z)\) via nonparametric regression from uniformly sampledĀ training data \(\{(z_i, y^\mathrm{train}_i)\}_{i=1}^T\)
data
bumps
prediction
\(z\)
Main result (informal)
As long as \(T\) is large enough, with probability at least \(1\delta\),
\(\mathrm{cost}(\widehat\pi)  \mathrm{cost}(\pi_\star) \lesssim\) \(rL_q L_p \left(\frac{\mathsf{dim}^2\sigma^4}{T}\right)^{\frac{1}{\mathsf{dim}+4}} \left\\begin{bmatrix} \mathbf \Phi_{xe}\\ \mathbf \Phi_{ue} \end{bmatrix}\right\\)
Assume:
 bounded radius of operation
 \(p_\star\) and \(q\) are continuous
Talk outline
 Safety with unknown dynamics
 Safety with complex observations
 Discovery in recommendations
Feedback in automated decision systems
Fairness: equality criteria on decisions
financial status
lending decision
academic history
admission decision
[Liu, D., Simchowitz, Rolf, Hardt. ICML ā18]Ā Ā Ā
[Rolf, Simchowitz, D., Liu, Bjorn, Hardt, Blumenstock. ICML ā20]Ā Ā Ā Ā
Wellbeing: impact of decisions
Twostep mechanism
financial status
lending decision
Compared to physical dynamics, social outcomes
 have limited predictability
 present difficulties of measurement
 are of indeterminate or contested value
Optimizing a policy is ultimately a form of social control
financial status
Discovery in recommendations
Does this system enable discovery?
Discovery in recommendations
Which items can an individual discover?
Discovery in recommendations
Which items can an individual discover?
Measure discovery via reachability
[D., Rich, Recht. FAccT '20]Ā Ā Ā Ā
[Curmei, D., Recht. ICML '21] Ā Ā Ā
Definition: An individual can discover item \(i\) if they can take an action \(\mathbf a\) so that item \(i\) is recommended
Measure discovery via reachability
[D., Rich, Recht. FAccT '20]Ā Ā Ā Ā
User \(u\) can discover item \(i\) if they can take an action \(\mathbf a\) so that item \(i\) is recommended
Convex condition as long as

linear preference models

top1 selection rules
\(\exists~~\mathbf a \in \mathcal A(u) ~~\text{s.t.}~~ \mathrm{\pi}(u, \mathbf{a}) = i \)
Auditing discovery
[D., Rich, Recht. FAccT '20]Ā Ā Ā Ā
Motivating questions:
 Does system provide discovery to new users? Old users?
 How is this affected by the learned preference model?
MF
top1
rate next items
Amount of discovery
Auditing discovery
[D., Rich, Recht. FAccT '20]Ā Ā Ā Ā
It is impossible
 for an individual to discover more than a limited number of movies
 for some movies to be recommended to any user at all
Amount of discovery
Discovery
Safety
Wellbeing
DMMRT, FoCM '19
DMMRT, NeurIPS '18
DTMR, ACC '19
DMRY, L4DC '20
DR20, arXiv '20
DTCRA, CoRL '20
TDDRYA20, arXiv '20
LDRSH, ICML '18
RSDLBHB, ICML '20
KDZGCRJ, arXiv '20
DRR, FAccT '20
ADGLZ, ISTAS '20
DDGK, FAT/ML '18
PDRW, BOE '19
DGLZ,
IEEE TTSĀ '20
Future work: ensuring safety
Principled & robust datadriven control with guarantees
 from complex observations
 for nonlinear systems
online calibration for rich sensing modalities
adaptivity to friction and contact forces
Future work: ensuring discovery

Design principles for recommendation systems

Relationship to strategic behavior and markets
Future work: articulating values
Integrating datadriven automation into important domains requires ensuring safety, discovery, equity, wellbeing, and more
Many challenges in formally defining these properties as technical specifications as well as in ensuring them in dynamic and uncertain systems
Thank you for your attention!
And thanks to my advisor,
my comittee,
Ben Recht
Moritz Hardt
Francesco Borrelli
Claire Tomlin
my collaborators,
my colleagues, friends, and family
Dissertation Talk
By Sarah Dean