CS 4/5789: Introduction to Reinforcement Learning
Lecture 17
Prof. Sarah Dean
MW 2:45-4pm
110 Hollister Hall
Agenda
0. Announcements & Recap
1. Motivation & Interactive Demo
2. Formal Setting
3. Balancing Exploration and Exploitation
Announcements
HW2 due Monday 3/28
5789 Paper Review Assignment (weekly pace suggested)
Prelims graded within a week
Recap: Unit 1
- MDPs, Policies, Distributions
- Value and Q functions
- Optimal Policies: VI, PI, DP, and LQR
- Approximate policies & properties like stability, reachability, observations, robustness
Recap: Unit 2
- Model-based RL: tabular & parametric settings
- Learning Q functions: rollout & Bellman-based supervision, Conservative Policy Iteration
- Policy Optimization: Random Search, REINFORCE, Actor-Critic, and Natural PG
Unit 3: Exploration
Exploration in RL is hard!
Example: mountainCar rewarded only at flag



Multi-Armed Bandit
A simplified setting for studying exploration
Online advertising


Applications of MAB
NYT Caption Contest
Medical Trials
Agenda
0. Announcements & Recap
1. Motivation & Interactive Demo
2. Formal Setting
3. Balancing Exploration and Exploitation
CS 4/5789: Lecture 17
By Sarah Dean