CS 4/5789: Introduction to Reinforcement Learning
Lecture 22
Prof. Sarah Dean
MW 2:454pm
110 Hollister Hall
Agenda
0. Announcements & Recap
1. Motivation: Imitation Learning
2. Behavioral Cloning
3. Analysis with PDL
Recap
Setting: Markov Decision Process \(\mathcal M = \{\mathcal S, \mathcal A, P, r, \gamma\}\)
Goal: Design policy \(\pi:\mathcal S\to\mathcal A\) with high cumulative reward

Unit 1: All components are known
 Value functions, Bellman equations

Unit 2: \(P, r\) Unknown
 Model based, value based, policy optimization
 Unit 3: Principled exploration (in simple settings)

Unit 4: Extensions and applications
 Today: expert demonstrations
Helicopter Acrobatics (Stanford)
LittleDog Robot (LAIRLab at CMU)
An Autonomous Land Vehicle In A Neural Network [Pomerleau, NIPS ‘88]
Imitation Learning
Expert Demonstrations
Supervised ML Algorithm
Policy \(\pi\)
ex  SVM, Gaussian Process, Kernel Ridge Regression, Deep Networks
maps states to actions
Ex: Learning to Drive
Ex: Learning to Drive
Policy \(\pi\)
Input: Camera Image
Output: Steering Angle
Ex: Learning to Drive
Supervised Learning
Policy
Dataset of expert trajectory
\((x, y)\)
...
\(\pi\)( ) =
Agenda
0. Announcements & Recap
1. Motivation: Imitation Learning
2. Behavioral Cloning
3. Analysis with PDL
expert trajectory
learned policy
No training data of "recovery" behavior
An Autonomous Land Vehicle In A Neural Network [Pomerleau, NIPS ‘88]
“If the network is not presented with sufficient variability in its training exemplars to cover the conditions it is likely to encounter...[it] will perform poorly”
CS 4/5789: Lecture 22
By Sarah Dean