Reliable Machine Learning in Feedback Systems
Sarah Dean
Dissertation Talk
July 27, 2021
Machine learning is a promising tool for processing complex information








velocity,
steering angle,
acceleration
velocity
\(\to\)
camera image
position
historical movie ratings
new movie rating



Dangers of static predictions in a dynamic world
- Catastrophic failure
- Inequality and bias
- Addiction
Promises of data-driven solutions
- Safety
- Equity
- Discovery
My approach: data-driven and robust
Ensure reliable outcomes by encoding values through lens of reachability
Where could the system go?
ML
control
optimization

data and uncertainty quantification
robust reachability condition
design a policy
Reasoning about reachability
Where the system goes
where the system is,
(trajectory)
(state)
and which actions are chosen
(policy/controller)
how the system changes,
(dynamics)
depends on:
in how the system changes
(dynamics)
in where the system is
(state)
dependent on the actions are chosen
(policy/controller)
Reasoning about uncertainty
Reliable outcomes via reachability
Safety
Discovery
system must remain in safe region
system must be able to reach many regions
Talk outline
- Safety with unknown dynamics
- Safety with complex observations
- Discovery in recommendations

- Safety with unknown dynamics
Safety with unknown dynamics
How much data do we have to collect from a system to safely control it?
Optimal control (reinforcement learning) problem
Tasks can be modeled as optimal control problems
\(\displaystyle \min_{\pi}~~ \mathrm{cost}(\pi)\)
\(\displaystyle \min_{\pi} ~~\mathrm{cost}(x_0,u_0,x_1,\dots)\)
?
Ā
\(~~~\mathrm{s.t.}~~ u_t = \pi_t(x_{0:t})\)
\(~~~~~~~~~~x_{t+1} = \mathrm{dynamics}_t(x_t,u_t, w_t)\)
\(~~~~~~~~~~x_t\in\mathcal X~~\text{for all}~ t~\text{and all}~w_t\in\mathcal W\)
state Ā Ā Ā Ā Ā Ā action/input
\(\mathrm{dynamics}\)
\(\pi\)
\(x_t\)
\(u_t\)
?
\(\pi_\star\)
?
disturbance
Sample complexity
Goals:
- Guarantee safety
- Achieve good performance
suboptimality: \(\mathrm{cost}(\widehat \pi)\) vs. \(\mathrm{cost}( \pi_\star)\)
?
How much data do we have to collect from a system to safely control it?
\(\mathrm{dynamics}\)
\(x_t\)
\(u_t\)
\(\widehat\pi\)
Example: double integrator dynamics



1. Collect data
2. Estimate model
3. Robust control
\(u_t\)


Problem setting: LQR
Linear dynamics and quadratic costs: the linear quadratic regulator (LQR)
Unknown dynamics: Ā Ā \(x_{t+1}=Ax_t+Bu_t+w_t\)
Cost: Ā Ā \(\sum_{t=0}^T x_t^\top Qx_t + u_t^\top R u_t\)
Constraints: Ā Ā \(F x_{t}\leq b\)
Our goal is to design a linear controller from data:
\(u_t=\widehat{\mathbf K}(x_{0:t})\)
Safe data-driven control in 3 steps
- Collect data
- Fit model and characterize errors
- Robust control
run system for \(T\) steps
least squares estimation
\((\widehat A,\widehat B)\) and \((\varepsilon_A,\varepsilon_B)\)
synthesize \(\mathbf{\widehat K}\) via convex optimization using estimates
Main result (informal)
As long as \(T\) is large enough, with probability at least \(1-\delta\),
the system remains safe during learning and operation and
\(\mathrm{cost}(\widehat{\mathbf K}) - \mathrm{cost}({\mathbf K}_\star)\lesssim \frac{\mathrm{size~of~noise}}{\mathrm{size~of~excitation}} \sqrt{\frac{\mathrm{dim}}{T} \log(\frac{\mathrm{dim}}{\delta})}Ā \cdot\mathrm{sensitivity}(\mathbf K_\star)\)
[D., Mania, Matni, Recht, Tu, FoCM '19; D., Tu, Matni, Recht, ACC '19]
safe excitation
Safe data-driven control in 3 steps
Main result (informal)
As long as \(T\) is large enough, with probability at least \(1-\delta\),
the system remains safe during learning and operation and
\(\mathrm{cost}(\widehat{\mathbf K}) - \mathrm{cost}({\mathbf K}_\star)\lesssim \frac{\mathrm{size~of~noise}}{\mathrm{size~of~excitation}} \sqrt{\frac{\mathrm{dim}}{T} \log(\frac{\mathrm{dim}}{\delta})}Ā \cdot\mathrm{sensitivity}(\mathbf K_\star)\)
DMMTR, FoCM '19: First general sample complexity bound with guaranteed stability when \(\mathcal X = \mathbb R^n\)
Previous work:
- Classic system identification provides asymptotic guarantees
- Fiechter (1997) makes strong stability assumptions
- Abbasi-Yadkori & Szepesvari (2011) study a computationally intractable adaptive method under stability assumption
DTMR, ACC '19: Sample complexity and guaranteed safety
when \(\mathcal X\) is a polytope, initial coarse estimates \(\widehat A_0\) and \(\widehat B_0\)
Ā
Ā
Ingredients:
1. Statistical learning rate
2. Robust constraint-satisfying control
3. Sub-optimality analysis
Step 1:
[D., Tu, Matni, Recht, ACC '19]Ā Ā Ā Ā Ā
Least squares estimation:
\((\widehat A, \widehat B) \in \underset{(A,B)}{\arg\min} \sum_{t=0}^T \|Ax_t +B u_t - x_{t+1}\|^2 \)
Learning rate (error characterization):
For stabilizing linear control and large enough \(T\), we have w.p. \(1-\delta\)
Ā
\(\Big\|\begin{bmatrix} \Delta_A \\ \Delta_B\end{bmatrix}\Big\| \)\(\lesssim \frac{\mathrm{size~of~noise}}{\mathrm{size~of~excitation}} \sqrt{\frac{\mathrm{dim}}{T} \log(\frac{\mathrm{dim}}{\delta})}Ā \)
\(u_t = {\color{goldenrod}\mathbf{K}_0}(x_{0:t})+ {\color{teal}\eta_t}\)
Collect data from system with safe excitation
Step 2:
Step 3:
process noise and excitation
\(\mathbf K\)
uncertain dynamics
\(\mathbf \Delta\)
Controller must keep system safe despite
System level synthesis
so we parametrize controller using system level synthesis (SLS) (Anderson et al., ARC 2019)
instead of a loop,
system looks like a line
\((A,B)\)
\(\mathbf{K}\)
\(\bf x\)
\(\bf u\)
\(\bf w\)
\(\bf x\)
\(\bf u\)
\(\bf w\)
\(\mathbf{\Phi}\)
\(\begin{bmatrix} \mathbf x\\ \mathbf u\end{bmatrix} = \begin{bmatrix} \mathbf \Phi_x\\ \mathbf \Phi_u\end{bmatrix}\mathbf w \)
Optimal control problem is nonconvex in \(\mathbf K\)
Robust control with SLS
Ā Ā Ā Ā \(u_t = \sum_{k=0}^t{\color{Goldenrod} K_k} x_{t-k}\)
\( \underset{\mathbf u }{\min}\) Ā \(\displaystyle\lim_{T\to\infty}\mathbb{E}\left[ \frac{1}{T}\sum_{t=0}^T x_t^\top Q x_t + u_t^\top R u_t\right]\)
\(\text{s.t.}~~x_{t+1} = Ax_t + Bu_t + w_t\)
We parametrize controller using system level synthesis (SLS) (Anderson et al., ARC 2019)
Ā Ā Ā Ā \(x_{t} \in\mathcal X~~\forall~\|w_t\|\leq \sigma_w\)
\(\begin{bmatrix} \mathbf x\\ \mathbf u\end{bmatrix} = \begin{bmatrix} \mathbf \Phi_x\\ \mathbf \Phi_u\end{bmatrix}\mathbf w \)
Equivalent problem with quadratic costs, linear dynamics, and tightened polytope constraints
Ā Ā Ā Ā \( {\color{teal}\mathbf\Phi }\in\mathrm{Polytope}_{\sigma_w}(\mathcal X)\)
\( \underset{\color{teal}\mathbf{\Phi}}{\min}\)\(\left\| \begin{bmatrix} Q^{1/2} &\\& R^{1/2}\end{bmatrix}{\color{teal} \mathbf{\Phi}} \right\|_{\mathcal{H}_2}^2\)
\(\text{s.t.}~~ {\color{teal}\mathbf\Phi }\in\mathrm{Affine}(A, B)\)
Robust control with SLS
\(\widehat A+\)\(\Delta_A\)
\(\widehat B+\)\(\Delta_B\)
\(\mathbf{K}\)
\(\bf x\)
\(\bf u\)
\(\bf w\)
\(\begin{bmatrix} \mathbf x\\ \mathbf u\end{bmatrix} = \begin{bmatrix} \widehat{\mathbf \Phi}_x\\ \widehat{\mathbf \Phi}_u\end{bmatrix}(I+\mathbf\Delta)^{-1}\mathbf w \)
There is a closed-form translation:
SLS makes apparent the effect of estimation errors in the dynamics
\(\bf x\)
\(\bf u\)
\(\bf w\)
\(\widehat\mathbf{\Phi}\)
\(\mathbf{\Delta}\)
Robust control with SLS
\( \widehat{\mathbf\Phi} = \underset{\mathbf{\Phi}, {\color{firebrick} \gamma}}{\arg\min}\) \(\frac{1}{1-\gamma}\)\(\left\| \begin{bmatrix} Q^{1/2} &\\& R^{1/2}\end{bmatrix} \mathbf{\Phi} \right\|_{\mathcal{H}_2}^2\)
\(\qquad\qquad\text{s.t.}~ {\mathbf\Phi }\in\mathrm{Affine}(\widehat A, \widehat B)\)
Ā Ā Ā \(\qquad\qquad\mathbf\PhiĀ \in\mathrm{Polytope}_{\sigma_w,{\color{firebrick} \gamma}}(\mathcal X)\)
Ā Ā Ā \(\qquad\qquad\|[{\varepsilon_A\atop ~} ~{~\atop \varepsilon_B}]\mathbf \Phi\|\leq\gamma\)
Convex optimization problem for fixed \(\gamma\)
[D., Tu, Matni, Recht, ACC '19]Ā Ā Ā Ā Ā
Use this robust constraint-satisfying controller for collecting data (step 1) and robust control (step 3)
Robust safety and suboptimality
[D., Tu, Matni, Recht, ACC '19]Ā Ā Ā Ā Ā
Safety and Suboptimality:
As long as the robust problem is feasible, \(\widehat{\mathbf K} = \widehat{\mathbf \Phi}_u\widehat{\mathbf \Phi}_x^{-1}\) keeps the system safe.
If \(\varepsilon_A, \varepsilon_B\) are small enough,
\(\displaystyle \frac{\mathrm{cost}(\widehat{\mathbf K}) - \mathrm{cost}({\mathbf K}_\star)}{\mathrm{cost}(\mathbf K_\star)}\lesssim \left\|\begin{bmatrix} \varepsilon_A & \\ & \varepsilon_B\end{bmatrix} \mathbf \Phi_\star\right\|_{\mathcal H_\infty} \)
Design controller with robust system level synthesis in terms of \(\widehat A,\widehat B\) and \(\varepsilon_A,\varepsilon_B\)
Safe data-driven control in 3 steps
Ingredients:
1. Statistical learning rate
2. Robust constraint-satisfying control
3. Sub-optimality analysis
Main result (informal)
As long as \(T\) is large enough, with probability at least \(1-\delta\),
the system remains safe during learning and operation and
\(\mathrm{cost}(\widehat{\mathbf K}) - \mathrm{cost}({\mathbf K}_\star)\lesssim \frac{\mathrm{size~of~noise}}{\mathrm{size~of~excitation}} \sqrt{\frac{\mathrm{dim}}{T} \log(\frac{\mathrm{dim}}{\delta})}Ā \cdot\mathrm{robustness}(\mathbf K_\star)\)
Safety with unknown dynamics
Data-driven controller design:
- ensures safety & optimizes performance
- learns unknown linear dynamics
- with finite sample guarantees
[D., Mania, Matni, Recht, Tu, NeuRIPS '18]
Method extends to adaptive setting
Ā Ā Ā [D., Mania, Matni, Recht, Tu, FoCM '19]
[D., Tu, Matni, Recht, ACC '19]
Talk outline
- Safety with unknown dynamics
- Safety with complex observations
- Discovery in recommendations

Safety with complex observations
How to remain safe with imperfect perception?
Problem setting: perception-based control
Known linear dynamics
\(x_{t+1} =Ax_t+Bu_t+w_t\)
\(z_t = q(Cx_t)\)
Complex observations (unknown appearance map)
\(z = q(Cx)\)
\( y = p(z) \)

\(y = Cx + e(x) \)
\(y_t = p(z_t)=Cx_t+e_t\)
Virtual sensor:
Output-feedback optimal control
\(y_t = Cx_t+ e_t\)
\(\min ~~\textrm{cost}(x_0, u_0, x_1,\dots)\)
\(\text{s.t.}\)
\(\pi_\star(\mathbf z) = \mathbf K p_\star(\mathbf z)\)
\(\widehat \pi (\mathbf z)= \mathbf K p(\mathbf z)\)
Suboptimality is bounded if errors are bounded
\(\mathrm{cost}(\pi_\star) - \mathrm{cost}(\widehat\pi) \leq \left\|\begin{bmatrix}\mathbf \Phi_{xe}\\ \mathbf \Phi_{ue}\end{bmatrix}\right\| \|\mathbf e\|\)
\(x_{t+1} =Ax_t+Bu_t+w_t\)
\(\mathbf K = \arg\)
The optimal controller uses a perfect perception map
The certainty equivalent controller
\((A,B,C)\)
Ā
\(\mathbf{K}\)
\(\bf y\)
\(\bf u\)
\(\bf w\)
\(\bf e\)
\(\bf x\)
\(\bf u\)
\(\begin{bmatrix} \mathbf \Phi_{xw} & \mathbf \Phi_{xe} \\ \mathbf \Phi_{uw} & \mathbf \Phi_{ue} \end{bmatrix}\)
\(=p_\star(z_t)\)
Uniform convergence
Learn perception map \(p(z)\) via nonparametric regression from uniformly sampledĀ training data \(\{(z_i, y^\mathrm{train}_i)\}_{i=1}^T\)
data
bumps
prediction
\(z\)
Main result (informal)
As long as \(T\) is large enough, with probability at least \(1-\delta\),
\(\mathrm{cost}(\widehat\pi) - \mathrm{cost}(\pi_\star) \lesssim\) \(rL_q L_p \left(\frac{\mathsf{dim}^2\sigma^4}{T}\right)^{\frac{1}{\mathsf{dim}+4}} \left\|\begin{bmatrix} \mathbf \Phi_{xe}\\ \mathbf \Phi_{ue} \end{bmatrix}\right\|\)
Assume:
- bounded radius of operation
- \(p_\star\) and \(q\) are continuous
Talk outline
- Safety with unknown dynamics
- Safety with complex observations
- Discovery in recommendations

Feedback in automated decision systems
Fairness: equality criteria on decisions
financial status
lending decision
academic history
admission decision
[Liu, D., Simchowitz, Rolf, Hardt. ICML ā18]Ā Ā Ā
[Rolf, Simchowitz, D., Liu, Bjorn, Hardt, Blumenstock. ICML ā20]Ā Ā Ā Ā
Wellbeing: impact of decisions






Two-step mechanism

financial status
lending decision

Compared to physical dynamics, social outcomes
- have limited predictability
- present difficulties of measurement
- are of indeterminate or contested value
Optimizing a policy is ultimately a form of social control
financial status
Discovery in recommendations

Does this system enable discovery?


Discovery in recommendations





Which items can an individual discover?
Discovery in recommendations





Which items can an individual discover?
Measure discovery via reachability
[D., Rich, Recht. FAccT '20]Ā Ā Ā Ā
[Curmei, D., Recht. ICML '21] Ā Ā Ā
Definition: An individual can discover item \(i\) if they can take an action \(\mathbf a\) so that item \(i\) is recommended


Measure discovery via reachability
[D., Rich, Recht. FAccT '20]Ā Ā Ā Ā
User \(u\) can discover item \(i\) if they can take an action \(\mathbf a\) so that item \(i\) is recommended
Convex condition as long as
-
linear preference models
-
top-1 selection rules
\(\exists~~\mathbf a \in \mathcal A(u) ~~\text{s.t.}~~ \mathrm{\pi}(u, \mathbf{a}) = i \)
Auditing discovery
[D., Rich, Recht. FAccT '20]Ā Ā Ā Ā

Motivating questions:
- Does system provide discovery to new users? Old users?
- How is this affected by the learned preference model?
MF
top-1
rate next items

Amount of discovery
Auditing discovery
[D., Rich, Recht. FAccT '20]Ā Ā Ā Ā
It is impossible
- for an individual to discover more than a limited number of movies
- for some movies to be recommended to any user at all

Amount of discovery
Discovery
Safety
Wellbeing
DMMRT, FoCM '19
DMMRT, NeurIPS '18
DTMR, ACC '19
DMRY, L4DC '20
DR20, arXiv '20
DTCRA, CoRL '20
TDDRYA20, arXiv '20
LDRSH, ICML '18
RSDLBHB, ICML '20
KDZGCRJ, arXiv '20
DRR, FAccT '20
ADGLZ, ISTAS '20
DDGK, FAT/ML '18

PDRW, BOE '19
DGLZ,
IEEE TTSĀ '20
Future work: ensuring safety
Principled & robust data-driven control with guarantees
- from complex observations
- for nonlinear systems

online calibration for rich sensing modalities
adaptivity to friction and contact forces
Future work: ensuring discovery
-
Design principles for recommendation systems
-
Relationship to strategic behavior and markets


Future work: articulating values
Integrating data-driven automation into important domains requires ensuring safety, discovery, equity, wellbeing, and more



Many challenges in formally defining these properties as technical specifications as well as in ensuring them in dynamic and uncertain systems
Thank you for your attention!
And thanks to my advisor,
my comittee,

Ben Recht


Moritz Hardt
Francesco Borrelli

Claire Tomlin
my collaborators,
























my colleagues, friends, and family




















Dissertation Talk
By Sarah Dean
Dissertation Talk
- 1,664