CS 4/5789: Introduction to Reinforcement Learning
Lecture 22
Prof. Sarah Dean
MW 2:45-4pm
110 Hollister Hall
Agenda
0. Announcements & Recap
1. Motivation: Imitation Learning
2. Behavioral Cloning
3. Analysis with PDL
Recap
Setting: Markov Decision Process \(\mathcal M = \{\mathcal S, \mathcal A, P, r, \gamma\}\)
Goal: Design policy \(\pi:\mathcal S\to\mathcal A\) with high cumulative reward
-
Unit 1: All components are known
- Value functions, Bellman equations
-
Unit 2: \(P, r\) Unknown
- Model based, value based, policy optimization
- Unit 3: Principled exploration (in simple settings)
-
Unit 4: Extensions and applications
- Today: expert demonstrations
Helicopter Acrobatics (Stanford)

LittleDog Robot (LAIRLab at CMU)


An Autonomous Land Vehicle In A Neural Network [Pomerleau, NIPS ‘88]
Imitation Learning
Expert Demonstrations
Supervised ML Algorithm
Policy \(\pi\)
ex - SVM, Gaussian Process, Kernel Ridge Regression, Deep Networks
maps states to actions

Ex: Learning to Drive
Ex: Learning to Drive
Policy \(\pi\)
Input: Camera Image
Output: Steering Angle

Ex: Learning to Drive
Supervised Learning
Policy
Dataset of expert trajectory
\((x, y)\)




...
\(\pi\)( ) =


Agenda
0. Announcements & Recap
1. Motivation: Imitation Learning
2. Behavioral Cloning
3. Analysis with PDL

expert trajectory
learned policy
No training data of "recovery" behavior


An Autonomous Land Vehicle In A Neural Network [Pomerleau, NIPS ‘88]
“If the network is not presented with sufficient variability in its training exemplars to cover the conditions it is likely to encounter...[it] will perform poorly”
CS 4/5789: Lecture 22
By Sarah Dean