Sarah Dean PRO
asst prof in CS at Cornell
Prof. Sarah Dean
MW 2:45-4pm
110 Hollister Hall
0. Announcements
1. Review
2. Questions
HW2 due Monday 3/28
5789 Paper Review Assignment (weekly pace suggested)
Today is the last day to drop
Prelim TOMORROW 3/22 at 7:30-9pm in Phillips 101
Closed-book, definition/equation sheet provided
Focus: mainly Unit 1 (known models) but many lectures in Unit 2 revisit important key concepts
Study Materials: Lecture Notes 1-15, HW0&1
Outline:
Participation point: PollEV.com/sarahdean011
Infinite Horizon Discounted MDP
\(\mathcal M = \{\mathcal{S}, \mathcal{A}, r, P, \gamma\}\)
Finite Horizon MDP
\(\mathcal M = \{\mathcal{S}, \mathcal{A}, r, P, H, \mu_0\}\)
ex - Pac-Man as MDP
Optimal Control Problem
ex - UAV as OCP
examples:
Policy results in a trajectory \(\tau = (s_0, a_0, s_1, a_1, ... )\)
\(s_0\)
\(a_0\)
\(s_1\)
\(a_1\)
\(s_2\)
\(a_2\)
...
\(s_0\)
\(a_0\)
\(s_1\)
\(a_1\)
\(s_2\)
\(a_2\)
...
\(s_0\)
\(a_0\)
\(s_1\)
\(a_1\)
\(s_2\)
\(a_2\)
...
Food for thought:
examples:
...
...
...
Recursive Bellman Expectation Equation:
...
...
...
Recall: Gardening MDP HW problem
...
...
...
Recall: Gardening MDP HW problem (verifying optimality)
Food for thought: What does Bellman Optimality imply about advantage function \(A^{\pi^*}(s,a)\)?
ex - UAV
Food for thought: What are dynamics, stability, value under linear policy \(a_t = K s_t\)?
Finite Horizon LQR: Application of Dynamic Programming
Basis for approximation-based algorithms (local linearization and iLQR)
By Sarah Dean