CS 4/5789: Introduction to Reinforcement Learning

Lecture 17

Prof. Sarah Dean

MW 2:45-4pm
110 Hollister Hall



0. Announcements & Recap

1. Motivation & Interactive Demo

2. Formal Setting

3. Balancing Exploration and Exploitation



HW2 due Monday 3/28


5789 Paper Review Assignment (weekly pace suggested)


Prelims graded within a week

Recap: Unit 1

  • MDPs, Policies, Distributions
  • Value and Q functions
  • Optimal Policies: VI, PI, DP, and LQR
  • Approximate policies & properties like stability, reachability, observations, robustness

Recap: Unit 2

  • Model-based RL: tabular & parametric settings
  • Learning Q functions: rollout & Bellman-based supervision, Conservative Policy Iteration
  • Policy Optimization: Random Search, REINFORCE, Actor-Critic, and Natural PG

Unit 3: Exploration

Exploration in RL is hard!

Example: mountainCar rewarded only at flag

Multi-Armed Bandit

A simplified setting for studying exploration

Online advertising

Applications of MAB

NYT Caption Contest

Medical Trials



0. Announcements & Recap

1. Motivation & Interactive Demo

2. Formal Setting

3. Balancing Exploration and Exploitation

CS 4/5789: Lecture 17

By Sarah Dean


CS 4/5789: Lecture 17