CS 4/5789: Introduction to Reinforcement Learning

Lecture 22

Prof. Sarah Dean

MW 2:45-4pm
110 Hollister Hall

Agenda

0. Announcements & Recap

1. Motivation: Imitation Learning

2. Behavioral Cloning

3. Analysis with PDL

Recap

Setting: Markov Decision Process \(\mathcal M = \{\mathcal S, \mathcal A, P, r, \gamma\}\)
Goal: Design policy \(\pi:\mathcal S\to\mathcal A\) with high cumulative reward

Unit 1: All components are known
- Value functions, Bellman equations
Unit 2: \(P, r\) Unknown
- Model based, value based, policy optimization
Unit 3: Principled exploration (in simple settings)
Unit 4: Extensions and applications
- Today: expert demonstrations

Helicopter Acrobatics (Stanford)

LittleDog Robot (LAIRLab at CMU)

An Autonomous Land Vehicle In A Neural Network [Pomerleau, NIPS ‘88]

Imitation Learning

Expert Demonstrations

Supervised ML Algorithm

Policy \(\pi\)

ex - SVM, Gaussian Process, Kernel Ridge Regression, Deep Networks

maps states to actions

Ex: Learning to Drive

Policy \(\pi\)

Input: Camera Image

Output: Steering Angle

Ex: Learning to Drive

Supervised Learning

Policy

Dataset of expert trajectory

\((x, y)\)

...

\(\pi\)( ) =

Agenda

0. Announcements & Recap

1. Motivation: Imitation Learning

2. Behavioral Cloning

3. Analysis with PDL

expert trajectory

learned policy

No training data of "recovery" behavior

An Autonomous Land Vehicle In A Neural Network [Pomerleau, NIPS ‘88]

“If the network is not presented with sufficient variability in its training exemplars to cover the conditions it is likely to encounter...[it] will perform poorly”

CS 4/5789: Lecture 22

By Sarah Dean

CS 4/5789: Lecture 22

Sarah Dean PRO

asst prof in CS at Cornell

sdean.website

CS 4/5789: Introduction to Reinforcement Learning

Lecture 22

Agenda

Recap

Imitation Learning

Ex: Learning to Drive

Ex: Learning to Drive

Ex: Learning to Drive

Agenda

CS 4/5789: Lecture 22

More from Sarah Dean