CS 4/5789: Introduction to Reinforcement Learning

Lecture 22

Prof. Sarah Dean

MW 2:45-4pm
110 Hollister Hall

Agenda

 

0. Announcements & Recap

1. Motivation: Imitation Learning

2. Behavioral Cloning

3. Analysis with PDL

Recap

Setting: Markov Decision Process \(\mathcal M = \{\mathcal S, \mathcal A, P, r, \gamma\}\)
Goal: Design policy \(\pi:\mathcal S\to\mathcal A\) with high cumulative reward

 

  • Unit 1: All components are known
    • Value functions, Bellman equations
  • Unit 2: \(P, r\) Unknown
    • Model based, value based, policy optimization
  • Unit 3: Principled exploration (in simple settings)
  • Unit 4: Extensions and applications
    • Today: expert demonstrations

Helicopter Acrobatics (Stanford)

LittleDog Robot (LAIRLab at CMU)

An Autonomous Land Vehicle In A Neural Network [Pomerleau, NIPS ‘88]

Imitation Learning

Expert Demonstrations

Supervised ML Algorithm

Policy \(\pi\)

ex - SVM, Gaussian Process, Kernel Ridge Regression, Deep Networks

maps states to actions

Ex: Learning to Drive

Ex: Learning to Drive

Policy \(\pi\)

Input: Camera Image

Output: Steering Angle

Ex: Learning to Drive

Supervised Learning

Policy

Dataset of expert trajectory

\((x, y)\)

...

\(\pi\)(       ) =

Agenda

 

0. Announcements & Recap

1. Motivation: Imitation Learning

2. Behavioral Cloning

3. Analysis with PDL

expert trajectory

learned policy

No training data of "recovery" behavior

An Autonomous Land Vehicle In A Neural Network [Pomerleau, NIPS ‘88]

“If the network is not presented with sufficient variability in its training exemplars to cover the conditions it is likely to encounter...[it] will perform poorly”

CS 4/5789: Lecture 22

By Sarah Dean

Private

CS 4/5789: Lecture 22