Prof. Sarah Dean

MW 2:45-4pm 110 Hollister Hall

0. Announcements & Recap

1. State Distribution & Transition Matrix

2. Continuous Control

3. Linear Dynamics

4. Stability

(Mostly) in person office hours

Register for participation: PollEV.com/sarahdean011

Waitlist/enrollement questions? cs-course-enroll@cornell.edu

Want to "audit"? Ask (on Ed Discussion) after Add deadline to retain Canvas access

HW0 due 2/14. Start ASAP!

Value Iteration and Policy Iteration

Finite Horizon MDP and Dynamic Programming

\(\mathcal M = \{\mathcal S, \mathcal A, P, r, H,\mu_0\}\)

\(Q_t^*(s,a) =r(s,a)+\mathbb E_{s'\sim P(s,a)}[V_{t+1}^*(s')]\)

initialize Q[0] for t=1,2,... Q[t+1] = BellmanOperator(Q[t])

initialize pi[0] for t=1,2,... Q[t] = PolicyEvaluation(pi[t]) pi[t+1] = argmax_a(Q[t](:,a))

By Sarah Dean

asst prof in CS at Cornell