Sarah Dean PRO
asst prof in CS at Cornell
Prof. Sarah Dean
MW 2:45-4pm
255 Olin Hall
1. Recap: Local LQR
2. Iterative LQR
3. PID Control
4. Limitations to Control
Theorem: For t=0,…,H−1, the optimal value function is quadratic and the optimal policy is linearVt⋆(s)=s⊤Pts and πt⋆(s)=Kts
where the matrices are defined as PH=Q and
minimize t=0∑H−1c(st,at)
s.t. st+1=f(st,at), at=πt(st)
π
For a symmetric matrix Q∈Rn×n the eigen-decomposition is Q=i=1∑nvivi⊤σi
To make this PSD, we replace Q←i=1∑nvivi⊤(max{0,σi}+λ)
at
πt⋆(s)=[γtposγtvel]s=γtpos(pos−x)+γtvelvel
γpos
γvel
−1
t
H
at
at
1. Recap: Local LQR
2. Iterative LQR
3. PID Control
4. Limitations to Control
minimize t=0∑H−1c(st,at)
s.t. st+1=f(st,at), at=πt(st)
π
s0∼μ0
Linearize around a trajectory. What trajectory? Iterate!
Black lines: τi−1, red arrows: trajectory if linearization was true, blue dashed lines: τi
1. Recap: Local LQR
2. Iterative LQR
3. PID Control
4. Limitations to Control
t
error
1. Recap: Local LQR
2. Iterative LQR
3. PID Control
4. Limitations to Control
0
1
a∈{stay,switch}
a=stay
a=switch
PollEV
Definition:
Theorem: Given finite S,A and transition function P, construct a directed graph with vertices V=S and an edge from s to s′ if P(s′∣s,a)>0 for some a∈A.
0
1
Proof:
0
1
Theorem: The linear dynamics st+1=Ast+Bat are controllable if the controllability grammian C is full rank. rank(C[BABA2B…Ans−1B])=ns
For the example st+1=[2001]st+[01]at
Theorem: The linear dynamics st+1=Ast+Bat are controllable if the controllability grammian C is full rank. rank(C[BABA2B…Ans−1B])=ns
Proof:
at
To get from s to s′ we can simply take the actions:
By Sarah Dean