CS 4/5789: Introduction to Reinforcement Learning
Lecture 24
Prof. Sarah Dean
MW 2:45-4pm
110 Hollister Hall
Agenda
0. Announcements & Recap
1. Inverse Reinforcement Learning
2. Maximum Entropy Principle
3. Constrained Optimization
Announcements
5789 Paper Review Assignment (weekly pace suggested)
HW 3 due, HW 4 released
PollEV Participation Count Updated on Canvas
My office hours are now Wednesdays 4:15-6pm in Gates 416A
Final exam Monday 5/16 at 7pm
Recap: Behavioral Cloning
Supervised Learning
Policy
Dataset of expert trajectory
\((x, y)\)




...
\(\pi\)( ) =


DAgger: Dataset Aggregation
Supervised Learning
Policy
Dataset
\(\mathcal D = (x_i, y_i)_{i=1}^M\)




...
\(\pi\)( ) =


Execute

Query Expert
\(\pi^*(s_0), \pi^*(s_1),...\)
\(s_0, s_1, s_2...\)
Aggregate
\((x_i = s_i, y_i = \pi^*(s_i))\)
BC vs. DAgger
Supervised learning guarantee
\(\mathbb E_{s\sim d^{\pi^*}_\mu}[\mathbf 1\{\widehat \pi(s) - \pi^*(s)\}]\leq \epsilon\)
Online learning guarantee
\(\mathbb E_{s\sim d^{\pi^t}_\mu}[\mathbf 1\{ \pi^t(s) - \pi^*(s)\}]\leq \epsilon\)
Performance Guarantee
\(V_\mu^{\pi^*} - V_\mu^{\widehat \pi} \leq \frac{2\epsilon}{(1-\gamma)^2}\)
Performance Guarantee
\(V_\mu^{\pi^*} - V_\mu^{\pi^t} \leq \frac{\max_{s,a}|A^{\pi^*}(s,a)|}{1-\gamma}\epsilon\)
Agenda
0. Announcements & Recap
1. Inverse Reinforcement Learning
2. Maximum Entropy Principle
3. Constrained Optimization

(Kitani et al., 2012)
Example: Image Features
CS 4/5789: Lecture 24
By Sarah Dean