Sarah Dean PRO
asst prof in CS at Cornell
Prof Sarah Dean
policy
πt:S→A
observation
st
accumulate
{(st,at,ct)}
Goal: select actions at to bring environment to low-cost states
while avoiding unsafe states
action
at
s
A state s is safe if s∈Ssafe.
A trajectory of states (s0,…,st) is safe if sk∈Ssafe for all 0≤k≤t.
A system st+1=F(st) is safe if some Sinv⊆Ssafe is invariant and s0∈Ssafe.
at=Ktst
amin t=0∑Tst⊤Qst+at⊤Rat
s.t. st+1=Ast+Bat
st∈Ssafe, at∈Asafe
[sa]=[ΦsΦa]w
w=s00⋮0
Φmin[Qˉ1/2Rˉ1/2][ΦsΦa]w22
s.t. [I−ZAˉ−ZBˉ][ΦsΦa]=I
Φsw∈SsafeT, Φaw∈AsafeT
Claim: Suppose that for all t, the policy satisfies
π(st)=argamin∥a−πunc⋆(st)∥22s.t. C(F(st,a))≤ γC(st)
size of s
size of a
safety constraint
C(s)=0
Instead of optimizing for open loop control...
a0,…,aTmin t=0∑Tc(st,at)
s.t. s0 given, st+1=F(st,at)
st∈Ssafe, at∈Asafe
...re-optimize to close the loop
model predicts the trajectory during planning
Also called Model Predictive Control
Figure from slides by Borelli, Jones, Morari
Plan:
time
Do
Plan
Do
Plan
Do
Plan
at
s
st
at
a0,…,aHmin k=0∑Hc(sk,ak)
s.t. s0 given, sk+1=F(sk,ak)
sk∈Ssafe, ak∈Asafe
We can:
π(st)=u0⋆(st)
u0,…,uHmink=0∑Hc(xk,uk)
s.t.x0=st,xk+1=F(xk,uk)
xk∈Ssafe, uk∈Asafe
Notation: distinguish real states and actions st and at from the planned optimization variables xk and uk.
[u0⋆,…,uH⋆](st)=arg
u0,…,uHmink=0∑Hc(xk,uk)
s.t.x0=st,xk+1=F(xk,uk)
xk∈Ssafe, uk∈Asafe
Notation: distinguish real states and actions st and at from the planned optimization variables xk and uk.
s
st
at=u0⋆(st)
Infinite Horizon LQR Problem
min T→∞limT1t=0∑Tst⊤Qst+at⊤Rats.tst+1=Ast+Bat
We know that at⋆=π⋆(st) where π⋆(s)=Ks and
Finite LQR Problem
min k=0∑Hxk⊤Qxk+uk⊤Ruks.tx0=s,xk+1=Axk+Buk
MPC Policy at=u0⋆(st) where
u0⋆(s)=K0s and
The state is position & velocity s=[θ,ω] with st+1=[10.11]st+[0 1]at
Goal: stay near origin and be energy efficient
Figures from slides by Goulart, Borelli
Figures from slides by Goulart, Borelli
Figures from slides by Goulart, Borelli
The state is position & velocity s=[θ,ω] with st+1=[10.11]st+[0 1]at
Goal: stay near origin and be energy efficient
References: Predictive Control by Borrelli, Bemporad, Morari
By Sarah Dean