Prof. Sarah Dean
MW 2:45-4pm
110 Hollister Hall
0. Announcements & Recap
1. State Distribution & Transition Matrix
2. Continuous Control
3. Linear Dynamics
4. Stability
(Mostly) in person office hours
Register for participation: PollEV.com/sarahdean011
Waitlist/enrollement questions? cs-course-enroll@cornell.edu
Want to "audit"? Ask (on Ed Discussion) after Add deadline to retain Canvas access
HW0 due 2/14. Start ASAP!
Value Iteration and Policy Iteration
Finite Horizon MDP and Dynamic Programming
\(\mathcal M = \{\mathcal S, \mathcal A, P, r, H,\mu_0\}\)
\(Q_t^*(s,a) =r(s,a)+\mathbb E_{s'\sim P(s,a)}[V_{t+1}^*(s')]\)
initialize Q[0]
for t=1,2,...
Q[t+1] = BellmanOperator(Q[t])
initialize pi[0]
for t=1,2,...
Q[t] = PolicyEvaluation(pi[t])
pi[t+1] = argmax_a(Q[t](:,a))