Certainty Equivalent Perception-Based Control

Sarah Dean and Benjamin Recht

UC Berkeley

L4DC, 8 June 2021

Motivation

image/svg+xml
image/svg+xml

Machine learning is a promising avenue for incorporating rich sensing modalities

Can we make strong guarantees in these settings?

backup slide about EKF/modelling vs. end-to-end

Perception-based optimal control

\(\displaystyle \min_{\pi\in\Pi} ~~\mathrm{cost}(x_0,u_0,x_1,\dots)\)

 

\(~~~\mathrm{s.t.}~~ u_t = \pi_t(z_{0:t})\)

\(~~~~~~~~~~x_{t+1} = \mathrm{dynamics}_t(x_t,u_t)\)

\(~~~~~~~~~~z_t = \mathrm{observation}_t(Cx_t)~\)

observation

?

?

\(\mathrm{dynamics~\&}\)\(\mathrm{observation}\)
 

\(\pi\)

\(z_t\)

\(u_t\)

\(x_t\)

output

\(y_t=Cx_t\)

Observation-feedback optimal control problem

Perception-based optimal control

\(\displaystyle \min_{\pi\in\Pi} ~~\mathrm{cost}(\widehat x_0,u_0,\widehat x_1,\dots)\)

 

\(~~~\mathrm{s.t.}~~ u_t = \pi_t(\widehat x_{0:t})\)

\(~~~~~~~~~~\widehat x_{t+1} = \mathrm{dynamics}_t(\widehat x_t,u_t)\)

\(~~~~~~~~~~\widehat x_t =\mathrm{EKF}(z_{0:t})~\)

\(\mathrm{dynamics,}\)\(\mathrm{obs, EKF}\)
 

\(\pi\)

\(\widehat x_t\)

\(u_t\)

\(x_t\)

Classic approach: physical models and filtering

Perception-based optimal control

\(\displaystyle \min_{\pi\in\Pi} ~~\mathrm{cost}(x_0,u_0,x_1,\dots)\)

 

\(~~~\mathrm{s.t.}~~ u_t = \widehat \pi_t(z_{0:t})\)

 

?

 

\(\mathrm{dynamics~\&}\)\(\mathrm{observation}\)
 

\(\widehat \pi\)

\(z_t\)

\(u_t\)

\(x_t\)

End-to-end approach: learn everything from data

Perception-based optimal control

\(\displaystyle \min_{\pi\in\Pi} ~~\mathrm{cost}(x_0,u_0,x_1,\dots)\)

 

\(~~~\mathrm{s.t.}~~ u_t = \pi_t(y_{0:t})\)

\(~~~~~~~~~~x_{t+1} = \mathrm{dynamics}_t(x_t,u_t)\)

\(~~~~~~~~~~y_t = \mathrm{perception}(\mathrm{observation}_t(Cx_t))~\)

\(\mathrm{dynamics~\&}\)\(\mathrm{obs,~percept.}\)
 

\(\pi\)

\(y_t\)

\(u_t\)

\(x_t\)

Our focus: learned perception map

Problem setting

\(\text{s.t.}~~x_{t+1} = {A }x_t+ {B} u_t\)

Robust reference tracking with linear dynamics and nonlinear partial observation

\(z_{t} =g(Cx_t)\)

\(\displaystyle\mathrm{cost} = \sup_{\substack{t\geq 0\\\mathbf x^\mathrm{ref} \in \mathcal R\\ \|x_0\|\leq \sigma_0}}\left\|\begin{bmatrix} Q (x_t - x_t^\mathrm{ref})\\ Ru_t \end{bmatrix}\right\|_\infty\)

Assumption 1:

\(A,B,C\) and \(Q,R\) are known and well posed

Assumption 2:

\(\mathcal R\) encodes a bounded radius of operation

Assumption 3:

Invertible \(h(g(y)) = y\) and \(g,h\) continuous

\(\displaystyle \min_{\pi}\)

\(\displaystyle \min_{\mathbf K}\)

\(u_{t} =\pi(z_{0:t}, x^\mathrm{ref}_{0:t})\)

Assumption 4:

Noisy training signal \(y^\mathrm{train}_{t} =Cx_t+\eta_t\)

\(y_t = h(z_t) = Cx_t\)

\(u_t = \mathbf K(y_{0:t}, x^\mathrm{ref}_{0:t})\)

Certainty equivalent controller \(\widehat \pi(z_{0:t}, x^\mathrm{ref}_{0:t}) = \mathbf K_\star (\widehat h(z_{0:t}), x^\mathrm{ref}_{0:t}) \)
where \(\widehat h\) is learned from data

Transform to linear output feedback problem with \(h\)



 

\(\pi_\star(z_{0:t}, x^\mathrm{ref}_{0:t}) = \mathbf K_\star (h(z_{0:t}), x^\mathrm{ref}_{0:t}) \)

Assumption 3 applies when:

  1. each position results in a unique image
  2. small changes to position result in small changes to image
image/svg+xml
image/svg+xml

\(\mathrm{dynamics~\&}\)\(\mathrm{observation}\)
 

\(\mathbf K\)

\(z_t\)

\(u_t\)

\(x_t\)

\(y_t\)

\(\mathrm{linear}\)
\(\mathrm{dynamics}\)

 

\(\mathbf K\)

\(y_t\)

\(u_t\)

\(x_t\)

\(h\)

Main Result (Informal)

\(\mathrm{cost}(\widehat\pi) - \mathrm{cost}(\pi_\star) \lesssim\) \( L \) \(r_\star\) \(s_\star\) \( \left(\frac{\sigma}{T}\right)\)\({}^{\frac{1}{p+4}}\)

depending on the continuity of \(g\) and \(h\), the radius of operation, the sensitivity of the optimal controller, the sensor noise, amount of data, and the dimension of the output

Ingredients

The certainty-equivalent controller has bounded suboptimality w.h.p.

1. Uniform convergence of \(\widehat h\)

  • sampling scheme

2. Closed-loop performance

Related Work

Classic controls: Weiner system identification

  • SISO (Greblicki et al. 97) or restrictive assumptions (Westwick & Verhaegen 96; Janczak et al. 07)
  • Focus on ID rather than control performance

Recent work:

Block MDP (Misra et al. 2020) and Rich LQR (Mhamedi et al. 2020) settings

  • Sample complexity has polynomial dependence
  • Finite state space or horizon sidestep stability concerns

Necessity of pointwise bounds

Example: 1D unstable linear system with arbitrary linear controller

\(x_{t+1} = a x_t + u_t\qquad u_t = \mathbf{K}(x^\mathrm{ref}_{0:t},\widehat h(z_{0:t}))\)

near perfect perception map:    \(\widehat h(g(x)) = \begin{cases} 0 & x = \bar x,~ |x|>r\\ x &\text{otherwise} \end{cases}\)

There exists a reference signal contained in \([-r,r]\) that causes the system to pass through \(\bar x\) and subsequently go unstable

\(\bar x\)

\(r\)

\(-r\)

\(t\)

Nonparametric regression

Nadarya Watson Regression: from training data \(\{(z_t, y_t^\mathrm{train})\}_{t=0}^T\)

predictions are weighted averages,

\(\displaystyle \widehat h(z) = \sum_{t=0}^T \frac{k_\gamma(z_t, z)}{\sum_{\ell=0}^T k_\gamma(z_\ell, z)} y_t^\mathrm{train} \)

Theorem (uniform convergence): Suppose training data uniformly sampled from \( \{y\mid \|y\|_\infty\leq r\}\) and bandwidth \(\gamma \propto T^{\frac{1}{p+4}}\). Whenever the system state contained in \(\{x\mid\|Cx\|_\infty \leq r\}\), then w.h.p.

\(\|h(z) - \widehat h(z)\|_\infty \lesssim rL_g L_h \left(\frac{p^2\sigma_\eta^4}{T}\right)^{\frac{1}{p+4}}\)

bandwidth

Nonparametric regression

Nadarya Watson Regression: from training data \(\{(z_t, y_t^\mathrm{train})\}_{t=0}^T\)

\(\displaystyle \widehat h(z) = \sum_{t=0}^T \frac{k_\gamma(z_t, z)}{s_T(z)} y_t^\mathrm{train} \)

\(=\sum_{t=0}^T k_\gamma(z_t, z)\)

The kernel function has the form \(\kappa\left(\frac{\rho(z_t, z)}{\gamma}\right)\) for

  1. \(\gamma\in\mathbb{R}_+\) bandwidth
  2. metric \(\rho:\mathbb{R}^q\to \mathbb{R}_+\)
  3. \(\kappa:\mathbb{R}_+\to[0,1]\) Lipschitz and bounded support

Sampling with linear control

Drive system to uniform samples  \(y_\ell^\mathrm{ref}\) using training output \(y_t^\mathrm{train}\)

\(\mathbf K\)

\(y^\mathrm{ref}_\ell \sim \mathrm{Unif}\{|y|\leq  r\}\)

\(\mathrm{dynamics~\&}\)\(\mathrm{observation}\)
 

\(z_t\)

\(u_t\)

\(x_t\)

\(y_t^\mathrm{train}\)

How to achieve uniform sampling?

Closed-loop guarantees

How does imperfect perception affect system evolution?

Define errors  \(e_t = \widehat h(z_t) - h(z_t) = \widehat h(z_t) - Cx_t\)

\(\displaystyle u_t = \sum_{k=0}^t K_k^y \widehat h(z_{t-k}) + K_k^\mathrm{ref} x^\mathrm{ref}_{t-k}\)

\(\displaystyle x_{t+1}=Ax_t+Bu_t\)

\(\displaystyle u_t = \sum_{k=0}^t K_k^y Cx_{t-k} + K_k^y Ce_{t-k} + K_k^\mathrm{ref} x^\mathrm{ref}_{t-k}\)

\(x_t = \sum_{k=0}^t \Phi_{xe}(k) e_{t-k} + \Phi_{xr}(k) x^\mathrm{ref}_{t-k}\)

Linearly.

\(u_t = \sum_{k=0}^t \Phi_{ue}(k) e_{t-k} + \Phi_{ur}(k) x^\mathrm{ref}_{t-k}\)

Proposition: Suppose that perception errors are uniformly bounded by \(\varepsilon_h\) and let \(\mathbf \Phi\) be system response associated with \(\mathbf K_\star\). Then,

\(\mathrm{cost}(\widehat\pi) \leq \mathrm{cost}(\pi_\star) + \varepsilon_h ~\left\|\begin{bmatrix} Q\mathbf \Phi_{xe}\\ R\mathbf \Phi_{ue} \end{bmatrix}\right\|_{\mathcal L_1} \)

Main Result

\(\mathrm{cost}(\widehat\pi) - \mathrm{cost}(\pi_\star) \lesssim\) \(rL_g L_h \left(\frac{p^2\sigma_\eta^4}{T}\right)^{\frac{1}{p+4}}\) \(\left\|\begin{bmatrix} Q\mathbf \Phi_{xn}\\ R\mathbf \Phi_{un} \end{bmatrix}\right\|_{\mathcal L_1} \)

The certainty-equivalent controller has bounded suboptimality w.h.p.

Ingredients

1. Uniform convergence of \(\widehat h\)

bounded errors

2. Closed-loop performance

propagation of errors

Simulation Experiments

Simplified UAV model: 2D double integrator

\(x_{t+1} = \begin{bmatrix}1 & 0.1 & & \\ 0 & 1 & & \\ & & 1 & 0.1 \\ & & 0 & 1\end{bmatrix} x_t +\begin{bmatrix}0 &  \\ 1 & \\ & 0 \\ & 1 \end{bmatrix} u_t \)

\(y_t = \begin{bmatrix} 1 & 0 & & \\ & & 1 & 0\end{bmatrix} x_t\)

\(z_t\) from CARLA simulator

Data-driven perception

Data collected with linear control and periodic reference signal:

Nadarya Watson (NW) with kernel \(k_\gamma(z, z_t) = \mathbf{1}\{\|z-z_t\|_2 \leq \gamma\}\)

Kernel Ridge Regression (KRR) with radial basis functions

Visual Odometry (VO) matches \(z\) to some \(z_t\) in database of labelled training images, uses homography between images to estimate pose

Simultaneous Localization and Mapping (SLAM) like VO with memory: adds new observations to database online, and initializes estimates based on previous timestep

Perception errors

classic nonparametric methods look similar

memoryless classic computer vision is similar, if noisier/wider

very different!

building obstructs  view

Thanks for your attention!

Certainty-Equivalent Perception-Based Control
Sarah Dean and Benjamin Recht

 

Read more at arxiv.org/abs/2008.12332
Code at github.com/modestyachts/certainty_equiv_perception_control
 

Certainty Equivalent Perception-Based Control

By Sarah Dean

Certainty Equivalent Perception-Based Control

  • 1,610