CS 4/5789: Lecture 24

CS 4/5789: Introduction to Reinforcement Learning

Lecture 24

Prof. Sarah Dean

MW 2:45-4pm
110 Hollister Hall

Agenda

0. Announcements & Recap

1. Inverse Reinforcement Learning

2. Maximum Entropy Principle

3. Constrained Optimization

Announcements

5789 Paper Review Assignment (weekly pace suggested)

HW 3 due, HW 4 released

PollEV Participation Count Updated on Canvas

My office hours are now Wednesdays 4:15-6pm in Gates 416A

Final exam Monday 5/16 at 7pm

Recap: Behavioral Cloning

Supervised Learning

Policy

Dataset of expert trajectory

\((x, y)\)

...

\(\pi\)( ) =

DAgger: Dataset Aggregation

Supervised Learning

Policy

Dataset

\(\mathcal D = (x_i, y_i)_{i=1}^M\)

...

\(\pi\)( ) =

Execute

Query Expert

\(\pi^*(s_0), \pi^*(s_1),...\)

\(s_0, s_1, s_2...\)

Aggregate

\((x_i = s_i, y_i = \pi^*(s_i))\)

BC vs. DAgger

Supervised learning guarantee

\(\mathbb E_{s\sim d^{\pi^*}_\mu}[\mathbf 1\{\widehat \pi(s) - \pi^*(s)\}]\leq \epsilon\)

Online learning guarantee

\(\mathbb E_{s\sim d^{\pi^t}_\mu}[\mathbf 1\{ \pi^t(s) - \pi^*(s)\}]\leq \epsilon\)

Performance Guarantee

\(V_\mu^{\pi^*} - V_\mu^{\widehat \pi} \leq \frac{2\epsilon}{(1-\gamma)^2}\)

Performance Guarantee

\(V_\mu^{\pi^*} - V_\mu^{\pi^t} \leq \frac{\max_{s,a}|A^{\pi^*}(s,a)|}{1-\gamma}\epsilon\)

Agenda

0. Announcements & Recap

1. Inverse Reinforcement Learning

2. Maximum Entropy Principle

3. Constrained Optimization

(Kitani et al., 2012)

CS 4/5789: Introduction to Reinforcement Learning

Lecture 24

Agenda

Announcements

Recap: Behavioral Cloning

DAgger: Dataset Aggregation

BC vs. DAgger

Agenda

Example: Image Features