CS 4/5789: Introduction to Reinforcement Learning

Prof. Sarah Dean

MW 2:45-4pm
110 Hollister Hall

Agenda

0. Announcements & Recap

1. State Distribution & Transition Matrix

2. Continuous Control

3. Linear Dynamics

4. Stability

(Mostly) in person office hours

Waitlist/enrollement questions? cs-course-enroll@cornell.edu

Want to "audit"? Ask (on Ed Discussion) after Add deadline to retain Canvas access

HW0 due 2/14. Start ASAP!

Value Iteration and Policy Iteration

Finite Horizon MDP and Dynamic Programming

$\mathcal M = \{\mathcal S, \mathcal A, P, r, H,\mu_0\}$

$Q_t^*(s,a) =r(s,a)+\mathbb E_{s'\sim P(s,a)}[V_{t+1}^*(s')]$

initialize Q[0]
for t=1,2,...
	Q[t+1] = BellmanOperator(Q[t])

initialize pi[0]
for t=1,2,...
    Q[t] = PolicyEvaluation(pi[t])
    pi[t+1] = argmax_a(Q[t](:,a))

By Sarah Dean

asst prof in CS at Cornell