Sarah Dean PRO
asst prof in CS at Cornell
Prof. Sarah Dean
MW 2:45-4pm
255 Olin Hall
1. Recap
2. Linear Control
3. Linear Quadratic Regulator
M={S,A,c,f,H}
minimize t=0∑H−1ct(st,at)+cH(sH)
s.t. st+1=f(st,at), at=πt(st)
π
0<λ2<λ1<1
0<λ2<1<λ1
1<λ2<λ1
C
R(λ)
I(λ)
Trajectory is determined by the eigenstructure of A
s1
s2
C
R(λ)
I(λ)
Trajectory is determined by the eigenstructure of A
s1
s2
λ=α±iβ
C
R(λ)
I(λ)
Trajectory is determined by the eigenstructure of A
s1
s2
λ=α±iβ
0<α2+β2<1
1<α2+β2
C
R(λ)
I(λ)
Trajectory is determined by the eigenstructure of A
s1
s2
λ1=λ2=λ
C
R(λ)
I(λ)
Trajectory is determined by the eigenstructure of A
s1
s2
0<λ<1
λ>1
λ1=λ2=λ
Theorem: Let {λi}i=1n⊂C be the eigenvalues of A.
Then for st+1=Ast, the equilibrium seq=0 is
C
Proof
If A is diagonalizable, then any s0 can be written as a linear combination of eigenvectors s0=∑i=1nsαivi
By definition, Avi=λivi
Therefore, st=∑i=1nsαiλitvi
Thus st→0 if and only if all ∣λi∣<1, and if any ∣λi∣>1, ∥st∥→∞
Proof in the non-diagonalizable case is out of scope, but it follows using the Jordan Normal Form
We call maxi∣λi∣=1 "marginally (un)stable"
1. Recap
2. Linear Control
3. Linear Quadratic Regulator
Full dynamics depend on actions st+1=Ast+Bat
at
Linear policy defined by at=Kst: st+1=Ast+BKst=(A+BK)st
at
PollEV
at
1. Recap
2. Linear Control
3. Linear Quadratic Regulator
Special case of optimal control problem with
minimize t=0∑H−1st⊤Qst+at⊤Rat+sH⊤QsH
s.t. st+1=Ast+Bat, at=πt(st)
π
at
Q=[1000],R=λ
amins⊤[1000]s+ (s′)⊤[1000]s′+λa2s.t.s′=[1011]s+[01]a
amin ([10]s)2+ ([11]s)2+λa2⟹a⋆=0
at
a0,a1min s0⊤[1000]s0 +s1⊤[1000]s1 +s2⊤[1000]s2+λa02+λa12
s.t.s1=[1011]s0+[01]a0,s2=[1011]s1+[01]a1
a0min s0⊤[1000]s0 +([10]s1)2+ ([11]s1)2+λa02
s.t.s1=[1011]s0+[01]a0,
at
a0⋆=−1+λ[12]s0
a1⋆=0
a0min s0⊤[1000]s0 +s1⊤[2111]s1+λa02s.t.s1=[1011]s0+[01]a0,
a0min s0⊤[1000]s0 +([1011]s0+[01]a0)⊤[2111]([1011]s0+[01]a0)+λa02
a0min s0⊤[1000]s0 +s0⊤[2335]s0 +2s0⊤[12]a0+a02+λa02
a0min s0⊤[3335]s0 +2s0⊤[12]a0+(1+λ)a02⟹a0⋆=−1+λ[12]s0
Reformulating for optimal control, our general purpose dynamic programming algorithm is:
Vt+1⋆(f(s,a))
DP: Vt⋆(s)=minac(s,a)+Vt+1⋆(f(s,a))
Theorem: For t=0,…,H−1, the optimal value function is quadratic and the optimal policy is linearVt⋆(s)=s⊤Pts and πt⋆(s)=Kts
where the matrices are defined as PH=Q and
at
πt⋆(s)=[γtposγtvel]s
γpos
γvel
−1
t
H
By Sarah Dean