Sarah Dean PRO
asst prof in CS at Cornell
Prof. Sarah Dean
MW 2:45-4pm
110 Hollister Hall
0. Announcements
1. Review
2. Questions
HW2 due Monday 3/28
5789 Paper Review Assignment (weekly pace suggested)
Today is the last day to drop
Prelim TOMORROW 3/22 at 7:30-9pm in Phillips 101
Closed-book, definition/equation sheet provided
Focus: mainly Unit 1 (known models) but many lectures in Unit 2 revisit important key concepts
Study Materials: Lecture Notes 1-15, HW0&1
Outline:
Participation point: PollEV.com/sarahdean011
Infinite Horizon Discounted MDP
M={S,A,r,P,γ}
Finite Horizon MDP
M={S,A,r,P,H,μ0}
ex - Pac-Man as MDP
Optimal Control Problem
ex - UAV as OCP
examples:
Policy results in a trajectory τ=(s0,a0,s1,a1,...)
s0
a0
s1
a1
s2
a2
...
s0
a0
s1
a1
s2
a2
...
s0
a0
s1
a1
s2
a2
...
Food for thought:
examples:
...
...
...
Recursive Bellman Expectation Equation:
...
...
...
Recall: Gardening MDP HW problem
...
...
...
Recall: Gardening MDP HW problem (verifying optimality)
Food for thought: What does Bellman Optimality imply about advantage function Aπ∗(s,a)?
ex - UAV
Food for thought: What are dynamics, stability, value under linear policy at=Kst?
Finite Horizon LQR: Application of Dynamic Programming
Basis for approximation-based algorithms (local linearization and iLQR)
By Sarah Dean