CS 4/5789: Introduction to Reinforcement Learning

Lecture 23

Prof. Sarah Dean

MW 2:45-4pm
110 Hollister Hall

Agenda

 

0. Announcements & Recap

1. DAgger Algorithm

2. Online Learning

3. Analysis with PDL

Announcements

 

5789 Paper Review Assignment (weekly pace suggested)

HW 3 due Monday 4/25

 

Final exam Monday 5/16 at 7pm

Behavioral Cloning

Supervised Learning

Policy

Dataset of expert trajectory

\((x, y)\)

...

\(\pi\)(       ) =

expert trajectory

learned policy

No training data of "recovery" behavior

query expert

learned policy

and append trajectory

retrain

Idea: interact with expert to ask what they would do

DAgger: Dataset Aggregation

Supervised Learning

Policy

Dataset

\(\mathcal D = (x_i, y_i)_{i=1}^M\)

...

\(\pi\)(       ) =

Execute

Query Expert

\(\pi^*(s_0), \pi^*(s_1),...\)

\(s_0, s_1, s_2...\)

Aggregate

\((x_i = s_i, y_i = \pi^*(s_i))\)

Ex: Off-road driving

[Pan et al, RSS 18]

Goal: map image to command

Approach: Use Model Predictive Controller as the expert!

\(\pi(\)                 \()=\) steering, throttle

Agenda

 

0. Announcements & Recap

1. DAgger Algorithm

2. Online Learning

3. Analysis with PDL

CS 4/5789: Lecture 23

By Sarah Dean

Private

CS 4/5789: Lecture 23