CS 4/5789: Introduction to Reinforcement Learning
Lecture 23
Prof. Sarah Dean
MW 2:45-4pm
110 Hollister Hall
Agenda
0. Announcements & Recap
1. DAgger Algorithm
2. Online Learning
3. Analysis with PDL
Announcements
5789 Paper Review Assignment (weekly pace suggested)
HW 3 due Monday 4/25
Final exam Monday 5/16 at 7pm
Behavioral Cloning
Supervised Learning
Policy
Dataset of expert trajectory
\((x, y)\)




...
\(\pi\)( ) =



expert trajectory
learned policy
No training data of "recovery" behavior


query expert
learned policy

and append trajectory
retrain
Idea: interact with expert to ask what they would do
DAgger: Dataset Aggregation
Supervised Learning
Policy
Dataset
\(\mathcal D = (x_i, y_i)_{i=1}^M\)




...
\(\pi\)( ) =


Execute

Query Expert
\(\pi^*(s_0), \pi^*(s_1),...\)
\(s_0, s_1, s_2...\)
Aggregate
\((x_i = s_i, y_i = \pi^*(s_i))\)
Ex: Off-road driving



[Pan et al, RSS 18]
Goal: map image to command
Approach: Use Model Predictive Controller as the expert!
\(\pi(\) \()=\) steering, throttle
Agenda
0. Announcements & Recap
1. DAgger Algorithm
2. Online Learning
3. Analysis with PDL
CS 4/5789: Lecture 23
By Sarah Dean