Prof. Sarah Dean
MW 2:45-4pm
110 Hollister Hall
0. Announcements & Recap
1. Motivation: Imitation Learning
2. Behavioral Cloning
3. Analysis with PDL
Setting: Markov Decision Process \(\mathcal M = \{\mathcal S, \mathcal A, P, r, \gamma\}\)
Goal: Design policy \(\pi:\mathcal S\to\mathcal A\) with high cumulative reward
Helicopter Acrobatics (Stanford)
LittleDog Robot (LAIRLab at CMU)
An Autonomous Land Vehicle In A Neural Network [Pomerleau, NIPS ‘88]
Expert Demonstrations
Supervised ML Algorithm
Policy \(\pi\)
ex - SVM, Gaussian Process, Kernel Ridge Regression, Deep Networks
maps states to actions
Policy \(\pi\)
Input: Camera Image
Output: Steering Angle
Supervised Learning
Policy
Dataset of expert trajectory
\((x, y)\)
...
\(\pi\)( ) =
0. Announcements & Recap
1. Motivation: Imitation Learning
2. Behavioral Cloning
3. Analysis with PDL
expert trajectory
learned policy
No training data of "recovery" behavior
An Autonomous Land Vehicle In A Neural Network [Pomerleau, NIPS ‘88]
“If the network is not presented with sufficient variability in its training exemplars to cover the conditions it is likely to encounter...[it] will perform poorly”