Sarah Dean PRO
asst prof in CS at Cornell
Dissertation Talk
July 27, 2021
velocity,
steering angle,
acceleration
velocity
\(\to\)
camera image
position
historical movie ratings
new movie rating
Ensure reliable outcomes by encoding values through lens of reachability
Where could the system go?
data and uncertainty quantification
robust reachability condition
design a policy
Where the system goes
where the system is,
(trajectory)
(state)
and which actions are chosen
(policy/controller)
how the system changes,
(dynamics)
depends on:
in how the system changes
(dynamics)
in where the system is
(state)
dependent on the actions are chosen
(policy/controller)
system must remain in safe region
system must be able to reach many regions
How much data do we have to collect from a system to safely control it?
Tasks can be modeled as optimal control problems
\(\displaystyle \min_{\pi}~~ \mathrm{cost}(\pi)\)
\(\displaystyle \min_{\pi} ~~\mathrm{cost}(x_0,u_0,x_1,\dots)\)
Ā
\(~~~\mathrm{s.t.}~~ u_t = \pi_t(x_{0:t})\)
\(~~~~~~~~~~x_{t+1} = \mathrm{dynamics}_t(x_t,u_t, w_t)\)
\(~~~~~~~~~~x_t\in\mathcal X~~\text{for all}~ t~\text{and all}~w_t\in\mathcal W\)
state Ā Ā Ā Ā Ā Ā action/input
\(\mathrm{dynamics}\)
\(\pi\)
\(x_t\)
\(u_t\)
\(\pi_\star\)
disturbance
How much data do we have to collect from a system to safely control it?
\(\mathrm{dynamics}\)
\(x_t\)
\(u_t\)
\(\widehat\pi\)
1. Collect data
2. Estimate model
3. Robust control
\(u_t\)
Linear dynamics and quadratic costs: the linear quadratic regulator (LQR)
Unknown dynamics: Ā Ā \(x_{t+1}=Ax_t+Bu_t+w_t\)
Cost: Ā Ā \(\sum_{t=0}^T x_t^\top Qx_t + u_t^\top R u_t\)
Constraints: Ā Ā \(F x_{t}\leq b\)
Our goal is to design a linear controller from data:
\(u_t=\widehat{\mathbf K}(x_{0:t})\)
run system for \(T\) steps
least squares estimation
\((\widehat A,\widehat B)\) and \((\varepsilon_A,\varepsilon_B)\)
synthesize \(\mathbf{\widehat K}\) via convex optimization using estimates
As long as \(T\) is large enough, with probability at least \(1-\delta\),
the system remains safe during learning and operation and
\(\mathrm{cost}(\widehat{\mathbf K}) - \mathrm{cost}({\mathbf K}_\star)\lesssim \frac{\mathrm{size~of~noise}}{\mathrm{size~of~excitation}} \sqrt{\frac{\mathrm{dim}}{T} \log(\frac{\mathrm{dim}}{\delta})}Ā \cdot\mathrm{sensitivity}(\mathbf K_\star)\)
[D., Mania, Matni, Recht, Tu, FoCM '19; D., Tu, Matni, Recht, ACC '19]
safe excitation
As long as \(T\) is large enough, with probability at least \(1-\delta\),
the system remains safe during learning and operation and
\(\mathrm{cost}(\widehat{\mathbf K}) - \mathrm{cost}({\mathbf K}_\star)\lesssim \frac{\mathrm{size~of~noise}}{\mathrm{size~of~excitation}} \sqrt{\frac{\mathrm{dim}}{T} \log(\frac{\mathrm{dim}}{\delta})}Ā \cdot\mathrm{sensitivity}(\mathbf K_\star)\)
DMMTR, FoCM '19: First general sample complexity bound with guaranteed stability when \(\mathcal X = \mathbb R^n\)
Previous work:
DTMR, ACC '19: Sample complexity and guaranteed safety
when \(\mathcal X\) is a polytope, initial coarse estimates \(\widehat A_0\) and \(\widehat B_0\)
Ā
Ā
Ingredients:
1. Statistical learning rate
2. Robust constraint-satisfying control
3. Sub-optimality analysis
[D., Tu, Matni, Recht, ACC '19]Ā Ā Ā Ā Ā
Least squares estimation:
\((\widehat A, \widehat B) \in \underset{(A,B)}{\arg\min} \sum_{t=0}^T \|Ax_t +B u_t - x_{t+1}\|^2 \)
Learning rate (error characterization):
For stabilizing linear control and large enough \(T\), we have w.p. \(1-\delta\)
Ā
\(\Big\|\begin{bmatrix} \Delta_A \\ \Delta_B\end{bmatrix}\Big\| \)\(\lesssim \frac{\mathrm{size~of~noise}}{\mathrm{size~of~excitation}} \sqrt{\frac{\mathrm{dim}}{T} \log(\frac{\mathrm{dim}}{\delta})}Ā \)
\(u_t = {\color{goldenrod}\mathbf{K}_0}(x_{0:t})+ {\color{teal}\eta_t}\)
Collect data from system with safe excitation
\(\mathbf K\)
\(\mathbf \Delta\)
so we parametrize controller using system level synthesis (SLS) (Anderson et al., ARC 2019)
instead of a loop,
system looks like a line
\((A,B)\)
\(\mathbf{K}\)
\(\bf x\)
\(\bf u\)
\(\bf w\)
\(\bf x\)
\(\bf u\)
\(\bf w\)
\(\begin{bmatrix} \mathbf x\\ \mathbf u\end{bmatrix} = \begin{bmatrix} \mathbf \Phi_x\\ \mathbf \Phi_u\end{bmatrix}\mathbf w \)
Optimal control problem is nonconvex in \(\mathbf K\)
Ā Ā Ā Ā \(u_t = \sum_{k=0}^t{\color{Goldenrod} K_k} x_{t-k}\)
\( \underset{\mathbf u }{\min}\) Ā \(\displaystyle\lim_{T\to\infty}\mathbb{E}\left[ \frac{1}{T}\sum_{t=0}^T x_t^\top Q x_t + u_t^\top R u_t\right]\)
\(\text{s.t.}~~x_{t+1} = Ax_t + Bu_t + w_t\)
We parametrize controller using system level synthesis (SLS) (Anderson et al., ARC 2019)
Ā Ā Ā Ā \(x_{t} \in\mathcal X~~\forall~\|w_t\|\leq \sigma_w\)
\(\begin{bmatrix} \mathbf x\\ \mathbf u\end{bmatrix} = \begin{bmatrix} \mathbf \Phi_x\\ \mathbf \Phi_u\end{bmatrix}\mathbf w \)
Equivalent problem with quadratic costs, linear dynamics, and tightened polytope constraints
Ā Ā Ā Ā \( {\color{teal}\mathbf\Phi }\in\mathrm{Polytope}_{\sigma_w}(\mathcal X)\)
\( \underset{\color{teal}\mathbf{\Phi}}{\min}\)\(\left\| \begin{bmatrix} Q^{1/2} &\\& R^{1/2}\end{bmatrix}{\color{teal} \mathbf{\Phi}} \right\|_{\mathcal{H}_2}^2\)
\(\text{s.t.}~~ {\color{teal}\mathbf\Phi }\in\mathrm{Affine}(A, B)\)
\(\widehat A+\)\(\Delta_A\)
\(\widehat B+\)\(\Delta_B\)
\(\mathbf{K}\)
\(\bf x\)
\(\bf u\)
\(\bf w\)
\(\begin{bmatrix} \mathbf x\\ \mathbf u\end{bmatrix} = \begin{bmatrix} \widehat{\mathbf \Phi}_x\\ \widehat{\mathbf \Phi}_u\end{bmatrix}(I+\mathbf\Delta)^{-1}\mathbf w \)
There is a closed-form translation:
SLS makes apparent the effect of estimation errors in the dynamics
\(\bf x\)
\(\bf u\)
\(\bf w\)
\(\mathbf{\Delta}\)
\( \widehat{\mathbf\Phi} = \underset{\mathbf{\Phi}, {\color{firebrick} \gamma}}{\arg\min}\) \(\frac{1}{1-\gamma}\)\(\left\| \begin{bmatrix} Q^{1/2} &\\& R^{1/2}\end{bmatrix} \mathbf{\Phi} \right\|_{\mathcal{H}_2}^2\)
\(\qquad\qquad\text{s.t.}~ {\mathbf\Phi }\in\mathrm{Affine}(\widehat A, \widehat B)\)
Ā Ā Ā \(\qquad\qquad\mathbf\PhiĀ \in\mathrm{Polytope}_{\sigma_w,{\color{firebrick} \gamma}}(\mathcal X)\)
Ā Ā Ā \(\qquad\qquad\|[{\varepsilon_A\atop ~} ~{~\atop \varepsilon_B}]\mathbf \Phi\|\leq\gamma\)
Convex optimization problem for fixed \(\gamma\)
[D., Tu, Matni, Recht, ACC '19]Ā Ā Ā Ā Ā
Use this robust constraint-satisfying controller for collecting data (step 1) and robust control (step 3)
[D., Tu, Matni, Recht, ACC '19]Ā Ā Ā Ā Ā
Safety and Suboptimality:
As long as the robust problem is feasible, \(\widehat{\mathbf K} = \widehat{\mathbf \Phi}_u\widehat{\mathbf \Phi}_x^{-1}\) keeps the system safe.
If \(\varepsilon_A, \varepsilon_B\) are small enough,
\(\displaystyle \frac{\mathrm{cost}(\widehat{\mathbf K}) - \mathrm{cost}({\mathbf K}_\star)}{\mathrm{cost}(\mathbf K_\star)}\lesssim \left\|\begin{bmatrix} \varepsilon_A & \\ & \varepsilon_B\end{bmatrix} \mathbf \Phi_\star\right\|_{\mathcal H_\infty} \)
Design controller with robust system level synthesis in terms of \(\widehat A,\widehat B\) and \(\varepsilon_A,\varepsilon_B\)
Ingredients:
1. Statistical learning rate
2. Robust constraint-satisfying control
3. Sub-optimality analysis
As long as \(T\) is large enough, with probability at least \(1-\delta\),
the system remains safe during learning and operation and
\(\mathrm{cost}(\widehat{\mathbf K}) - \mathrm{cost}({\mathbf K}_\star)\lesssim \frac{\mathrm{size~of~noise}}{\mathrm{size~of~excitation}} \sqrt{\frac{\mathrm{dim}}{T} \log(\frac{\mathrm{dim}}{\delta})}Ā \cdot\mathrm{robustness}(\mathbf K_\star)\)
Data-driven controller design:
[D., Mania, Matni, Recht, Tu, NeuRIPS '18]
Method extends to adaptive setting
Ā Ā Ā [D., Mania, Matni, Recht, Tu, FoCM '19]
[D., Tu, Matni, Recht, ACC '19]
How to remain safe with imperfect perception?
Known linear dynamics
\(x_{t+1} =Ax_t+Bu_t+w_t\)
\(z_t = q(Cx_t)\)
Complex observations (unknown appearance map)
\(z = q(Cx)\)
\( y = p(z) \)
\(y = Cx + e(x) \)
\(y_t = p(z_t)=Cx_t+e_t\)
Virtual sensor:
\(y_t = Cx_t+ e_t\)
\(\min ~~\textrm{cost}(x_0, u_0, x_1,\dots)\)
\(\text{s.t.}\)
\(\pi_\star(\mathbf z) = \mathbf K p_\star(\mathbf z)\)
\(\widehat \pi (\mathbf z)= \mathbf K p(\mathbf z)\)
Suboptimality is bounded if errors are bounded
\(\mathrm{cost}(\pi_\star) - \mathrm{cost}(\widehat\pi) \leq \left\|\begin{bmatrix}\mathbf \Phi_{xe}\\ \mathbf \Phi_{ue}\end{bmatrix}\right\| \|\mathbf e\|\)
\(x_{t+1} =Ax_t+Bu_t+w_t\)
\(\mathbf K = \arg\)
The optimal controller uses a perfect perception map
The certainty equivalent controller
\((A,B,C)\)
Ā
\(\mathbf{K}\)
\(\bf y\)
\(\bf u\)
\(\bf w\)
\(\bf e\)
\(\bf x\)
\(\bf u\)
\(=p_\star(z_t)\)
Learn perception map \(p(z)\) via nonparametric regression from uniformly sampledĀ training data \(\{(z_i, y^\mathrm{train}_i)\}_{i=1}^T\)
data
bumps
prediction
\(z\)
As long as \(T\) is large enough, with probability at least \(1-\delta\),
\(\mathrm{cost}(\widehat\pi) - \mathrm{cost}(\pi_\star) \lesssim\) \(rL_q L_p \left(\frac{\mathsf{dim}^2\sigma^4}{T}\right)^{\frac{1}{\mathsf{dim}+4}} \left\|\begin{bmatrix} \mathbf \Phi_{xe}\\ \mathbf \Phi_{ue} \end{bmatrix}\right\|\)
Assume:
Fairness: equality criteria on decisions
financial status
lending decision
academic history
admission decision
[Liu, D., Simchowitz, Rolf, Hardt. ICML ā18]Ā Ā Ā
[Rolf, Simchowitz, D., Liu, Bjorn, Hardt, Blumenstock. ICML ā20]Ā Ā Ā Ā
Wellbeing: impact of decisions
financial status
lending decision
Compared to physical dynamics, social outcomes
Optimizing a policy is ultimately a form of social control
financial status
Does this system enable discovery?
Which items can an individual discover?
Which items can an individual discover?
[D., Rich, Recht. FAccT '20]Ā Ā Ā Ā
[Curmei, D., Recht. ICML '21] Ā Ā Ā
Definition: An individual can discover item \(i\) if they can take an action \(\mathbf a\) so that item \(i\) is recommended
[D., Rich, Recht. FAccT '20]Ā Ā Ā Ā
User \(u\) can discover item \(i\) if they can take an action \(\mathbf a\) so that item \(i\) is recommended
Convex condition as long as
linear preference models
top-1 selection rules
\(\exists~~\mathbf a \in \mathcal A(u) ~~\text{s.t.}~~ \mathrm{\pi}(u, \mathbf{a}) = i \)
[D., Rich, Recht. FAccT '20]Ā Ā Ā Ā
Motivating questions:
MF
top-1
rate next items
Amount of discovery
[D., Rich, Recht. FAccT '20]Ā Ā Ā Ā
Amount of discovery
DMMRT, FoCM '19
DMMRT, NeurIPS '18
DTMR, ACC '19
DMRY, L4DC '20
DR20, arXiv '20
DTCRA, CoRL '20
TDDRYA20, arXiv '20
LDRSH, ICML '18
RSDLBHB, ICML '20
KDZGCRJ, arXiv '20
DRR, FAccT '20
ADGLZ, ISTAS '20
DDGK, FAT/ML '18
PDRW, BOE '19
DGLZ,
IEEE TTSĀ '20
Principled & robust data-driven control with guarantees
online calibration for rich sensing modalities
adaptivity to friction and contact forces
Design principles for recommendation systems
Relationship to strategic behavior and markets
Integrating data-driven automation into important domains requires ensuring safety, discovery, equity, wellbeing, and more
Many challenges in formally defining these properties as technical specifications as well as in ensuring them in dynamic and uncertain systems
Ben Recht
Moritz Hardt
Francesco Borrelli
Claire Tomlin
By Sarah Dean