CS 4/5789: Introduction to Reinforcement Learning

Lecture 5

Prof. Sarah Dean

MW 2:45-4pm
110 Hollister Hall



0. Announcements & Recap

1. State Distribution & Transition Matrix

2. Continuous Control

3. Linear Dynamics

4. Stability


(Mostly) in person office hours


Register for participation: PollEV.com/sarahdean011


Waitlist/enrollement questions? cs-course-enroll@cornell.edu


Want to "audit"? Ask (on Ed Discussion) after Add deadline to retain Canvas access


HW0 due 2/14. Start ASAP!


Value Iteration and Policy Iteration

Finite Horizon MDP and Dynamic Programming

\(\mathcal M = \{\mathcal S, \mathcal A, P, r, H,\mu_0\}\)

\(Q_t^*(s,a) =r(s,a)+\mathbb E_{s'\sim P(s,a)}[V_{t+1}^*(s')]\)

initialize Q[0]
for t=1,2,...
	Q[t+1] = BellmanOperator(Q[t])
initialize pi[0]
for t=1,2,...
    Q[t] = PolicyEvaluation(pi[t])
    pi[t+1] = argmax_a(Q[t](:,a)) 

CS 4/5789: Lecture 5

By Sarah Dean


CS 4/5789: Lecture 5