Sarah Dean PRO
asst prof in CS at Cornell
L4DC, 8 June 2021
Machine learning is a promising avenue for incorporating rich sensing modalities
Can we make strong guarantees in these settings?
backup slide about EKF/modelling vs. end-to-end
π∈Πmin cost(x0,u0,x1,…)
s.t. ut=πt(z0:t)
xt+1=dynamicst(xt,ut)
zt=observationt(Cxt)
observation
?
?
dynamics &observation
π
zt
ut
xt
output
yt=Cxt
Observation-feedback optimal control problem
π∈Πmin cost(x0,u0,x1,…)
s.t. ut=πt(x0:t)
xt+1=dynamicst(xt,ut)
xt=EKF(z0:t)
dynamics,obs,EKF
π
xt
ut
xt
Classic approach: physical models and filtering
π∈Πmin cost(x0,u0,x1,…)
s.t. ut=πt(z0:t)
?
dynamics &observation
π
zt
ut
xt
End-to-end approach: learn everything from data
π∈Πmin cost(x0,u0,x1,…)
s.t. ut=πt(y0:t)
xt+1=dynamicst(xt,ut)
yt=perception(observationt(Cxt))
dynamics &obs, percept.
π
yt
ut
xt
Our focus: learned perception map
s.t. xt+1=Axt+But
Robust reference tracking with linear dynamics and nonlinear partial observation
zt=g(Cxt)
cost=t≥0xref∈R∥x0∥≤σ0sup[Q(xt−xtref)Rut]∞
Assumption 1:
A,B,C and Q,R are known and well posed
Assumption 2:
R encodes a bounded radius of operation
Assumption 3:
Invertible h(g(y))=y and g,h continuous
πmin
Kmin
ut=π(z0:t,x0:tref)
Assumption 4:
Noisy training signal yttrain=Cxt+ηt
yt=h(zt)=Cxt
ut=K(y0:t,x0:tref)
Certainty equivalent controller π(z0:t,x0:tref)=K⋆(h(z0:t),x0:tref)
where h is learned from data
Transform to linear output feedback problem with h
π⋆(z0:t,x0:tref)=K⋆(h(z0:t),x0:tref)
Assumption 3 applies when:
dynamics &observation
K
zt
ut
xt
yt
linear
dynamics
K
yt
ut
xt
h
cost(π)−cost(π⋆)≲ L r⋆ s⋆ (Tσ)p+41
depending on the continuity of g and h, the radius of operation, the sensitivity of the optimal controller, the sensor noise, amount of data, and the dimension of the output
Ingredients
The certainty-equivalent controller has bounded suboptimality w.h.p.
1. Uniform convergence of h
2. Closed-loop performance
Classic controls: Weiner system identification
Recent work:
Block MDP (Misra et al. 2020) and Rich LQR (Mhamedi et al. 2020) settings
Example: 1D unstable linear system with arbitrary linear controller
xt+1=axt+utut=K(x0:tref,h(z0:t))
near perfect perception map: h(g(x))={0xx=xˉ, ∣x∣>rotherwise
There exists a reference signal contained in [−r,r] that causes the system to pass through xˉ and subsequently go unstable
xˉ
r
−r
t
Nadarya Watson Regression: from training data {(zt,yttrain)}t=0T
predictions are weighted averages,
h(z)=t=0∑T∑ℓ=0Tkγ(zℓ,z)kγ(zt,z)yttrain
Theorem (uniform convergence): Suppose training data uniformly sampled from {y∣∥y∥∞≤r} and bandwidth γ∝Tp+41. Whenever the system state contained in {x∣∥Cx∥∞≤r}, then w.h.p.
∥h(z)−h(z)∥∞≲rLgLh(Tp2ση4)p+41
bandwidth
Nadarya Watson Regression: from training data {(zt,yttrain)}t=0T
h(z)=t=0∑TsT(z)kγ(zt,z)yttrain
=∑t=0Tkγ(zt,z)
The kernel function has the form κ(γρ(zt,z)) for
Drive system to uniform samples yℓref using training output yttrain
K
yℓref∼Unif{∣y∣≤ r}
dynamics &observation
zt
ut
xt
yttrain
How to achieve uniform sampling?
How does imperfect perception affect system evolution?
Define errors et=h(zt)−h(zt)=h(zt)−Cxt
ut=k=0∑tKkyh(zt−k)+Kkrefxt−kref
xt+1=Axt+But
ut=k=0∑tKkyCxt−k+KkyCet−k+Kkrefxt−kref
xt=∑k=0tΦxe(k)et−k+Φxr(k)xt−kref
Linearly.
ut=∑k=0tΦue(k)et−k+Φur(k)xt−kref
Proposition: Suppose that perception errors are uniformly bounded by εh and let Φ be system response associated with K⋆. Then,
cost(π)≤cost(π⋆)+εh [QΦxeRΦue]L1
cost(π)−cost(π⋆)≲ rLgLh(Tp2ση4)p+41 [QΦxnRΦun]L1
The certainty-equivalent controller has bounded suboptimality w.h.p.
Ingredients
1. Uniform convergence of h
bounded errors
2. Closed-loop performance
propagation of errors
Simplified UAV model: 2D double integrator
xt+1=100.11100.11xt+01 01ut
yt=[1010]xt
zt from CARLA simulator
Data collected with linear control and periodic reference signal:
Nadarya Watson (NW) with kernel kγ(z,zt)=1{∥z−zt∥2≤γ}
Kernel Ridge Regression (KRR) with radial basis functions
Visual Odometry (VO) matches z to some zt in database of labelled training images, uses homography between images to estimate pose
Simultaneous Localization and Mapping (SLAM) like VO with memory: adds new observations to database online, and initializes estimates based on previous timestep
classic nonparametric methods look similar
memoryless classic computer vision is similar, if noisier/wider
very different!
building obstructs view
Certainty-Equivalent Perception-Based Control
Sarah Dean and Benjamin Recht
Read more at arxiv.org/abs/2008.12332
Code at github.com/modestyachts/certainty_equiv_perception_control
By Sarah Dean