Model-Based RL:
LQR, iLQR
Curricula
- Linear Quadratic Regulator
- iterative LQR (iLQR)
- The case of unknown dynamics
LQR - motivation
We are aimed to optimize
for now assume deterministic dynamics
What if r and f are known and are "nice":
- we can find maxar(s,a) analytically
- the composition of r and f is also "nice"
Then we can express the optimal aT action as a function of sT:
Now we can write the value of the last time-step
LQR - motivation
Going one step backward:
Apply recursively until t=0 where the state s0 is known!
Then we can go forward:
Using the rule of "nice" compositions:
Linear dynamics, Quadratic rewards
Dynamics
Rewards
LQR - backward pass
The last Q:
Equate the gradient to zero:
Get the optimal last time-step behavior
LQR - backward pass
(check it yourself slide)
The last time-step value is also quadratic in sT
The Q function at T−1 is again quadratic in both sT−1 and aT−1:
Thus you can get an analytical formula for policy at each time-step:
LQR - algorithm
Given: s0,Ft,ft,Rt,rt for all t
- Calculate Qt,qt for all t going backward
- Calculate at=−Qtaa−1(Qtasst+qta) for all t going forward
- Calculate st+1=ft(st,at)
LQR - stochastic dynamics

Closed-loop
Planning
Tutorial
LQR - algorithm (stochastic dynamics)
Given: s0,Ft,ft,Rt,rt for all t
- Calculate Qt,qt for all t going backward
- Calculate a0=−Q0aa−1(Q0ass0+q0a) only for t=0
- Apply a0 in the real environment
- Observe s1∼p(s1∣s0,a0)
- Start from the beginning!
Curricula
- Linear Quadratic Regulator
- iterative LQR (iLQR)
- The case of unknown dynamics
The dynamics is not L
the rewards are not Q :(
Tailor's expansion:
f(st,at)≈f(s^t,a^t)+∇f(s^t,a^t)[st−s^tat−a^t]
r(st,at)≈r(s^t,a^t)+∇r(s^t,a^t)[st−s^tat−a^t]+21[st−s^tat−a^t]T∇2r(s^t,a^t)[st−s^tat−a^t]
Simplify a bit....
f~t(st,at)= Ft[stat]+ft
r~t(st,at)=21[st at]TRt[st at]+[stat]Trt
Iterative LQR (iLQR)
Algorithm:
Initialize (s^0,a^0,s^1,…) somehow
- Ft=∇f(s^t,a^t)
- ft=…
- Rt=∇2r(s^t,a^t)
- rt=…
- (a0,s1,a1,…)=LQR()
- (s^0,a^0,s^1,a^1,…)←(s0,a0,s1,a1,…)
- Go to 1.

Curricula
- Linear Quadratic Regulator
- iterative LQR (iLQR)
- The case of unknown dynamics




MB-RL: LQR, iLQR, DDP
By cydoroga
MB-RL: LQR, iLQR, DDP
- 557