Sarah Dean PRO
asst prof in CS at Cornell
Prof Sarah Dean
policy
πt:S→A
observation
st
accumulate
{(st,at,ct)}
Goal: select actions at to bring environment to low-cost states
while avoiding unsafe states
action
at
s
time
Do
Plan
Do
Plan
Do
Plan
π(st)=u0⋆(st)
u0,…,uH−1mink=0∑H−1c(xk,uk)
s.t.x0=st,xk+1=F(xk,uk)
xk∈Ssafe, uk∈Asafe
Notation: distinguish real states and actions st and at from the planned optimization variables xk and uk.
[u0⋆,…,uH−1⋆](st)=arg
u0,…,uH−1mink=0∑H−1c(xk,uk)
s.t.x0=st,xk+1=F(xk,uk)
xk∈Ssafe, uk∈Asafe
Notation: distinguish real states and actions st and at from the planned optimization variables xk and uk.
s
st
at=u0⋆(st)
The state is position & velocity s=[θ,ω] with st+1=[10.11]st+[0 1]at
Goal: stay near origin and be energy efficient
u0,…,uH−1mink=0∑H−1c(xk,uk)+cH(xH)
s.t.x0=s,xk+1=F(xk,uk)
xk∈Ssafe, uk∈Asafe, xH∈SH
u0,…,uHmink=0∑H−1c(xk,uk)
s.t.x0=s,xk+1=F(xk,uk)
xk∈Ssafe, uk∈Asafe, xH=0
Recursive feasibility: feasible at st⟹ feasible at st+1
Proof:
u0,…,uHmink=0∑H−1c(xk,uk)s.t.x0=st,xk+1=F(xk,uk)
xk∈Ssafe, uk∈Asafe, xH=0
Definition: A Lyapunov function V:S→R for F is continuous and
Theorem (1.2, 1.4): Suppose that F is locally Lipschitz, seq=0 is a fixed point, and V is a Lyapunov function for F,seq. Then, seq=0 is
Proof:
J⋆(s) is positive definite and strictly decreasing. Therefore, the closed loop dynamics F(⋅,πMPC(⋅)) are asymptotically stable.
u0,…,uHmink=0∑H−1c(xk,uk)+cH(xH)
s.t.x0=s,xk+1=F(xk,uk)
xk∈Ssafe, uk∈Asafe, xH∈SH
Assumptions:
Recursive feasibility: feasible at st⟹ feasible at st+1
Proof:
Proof:
J⋆(s) is positive definite and strictly decreasing. Therefore, the closed loop dynamics F(⋅,πMPC(⋅)) are asymptotically stable.
Based on unconstrained LQR policy where P=DARE(A,B,Q,R) K=−(B⊤PB+R)−1B⊤P
Constrained LQR Problem
min T→∞limT1t=0∑Tst⊤Qst+at⊤Rats.tst+1=Ast+BatGsst≤bs,Gaat≤ba
MPC Policy
min k=0∑H−1xk⊤Qxk+uk⊤Ruk+xH⊤PxHs.tx0=s, xk+1=Axk+BukGsxk≤bs,Gauk≤ba,xH∈SH
This satisfies the assumptions:
This is MPC with H=1 and correct terminal cost!
Constrained LQR Problem
min T→∞limT1t=0∑Tst⊤Qst+at⊤Rats.tst+1=Ast+Bat
MPC Policy
min k=0∑H−1xk⊤Qxk+uk⊤Ruk+xH⊤PxHs.tx0=s, xk+1=Axk+Buk
u0,…,uH−1mink=0∑H−1c(xk,uk)+cH(xH)
s.t.x0=st,xk+1=F(xk,uk)
xk∈Ssafe, uk∈Asafe, xH∈SH
s
st
at=u0⋆(st)
u0,…,uH−1mink=0∑H−1c(xk,uk)+cH(xH)
s.t.x0=st,xk+1=F(xk,uk)
xk∈Ssafe, uk∈Asafe, xH∈SH
s
st
at=u0⋆(st)
References: Predictive Control by Borrelli, Bemporad, Morari
[RB17] Learning model predictive control for iterative tasks
[DSA+20] Fairness is not static
[FLD21] Algorithmic fairness and the situated dynamics of justice
By Sarah Dean