Prof. Sarah Dean
MW 2:45-4pm
110 Hollister Hall
0. Announcements & Recap
1. Inverse Reinforcement Learning
2. Maximum Entropy Principle
3. Constrained Optimization
5789 Paper Review Assignment (weekly pace suggested)
HW 3 due, HW 4 released
PollEV Participation Count Updated on Canvas
My office hours are now Wednesdays 4:15-6pm in Gates 416A
Final exam Monday 5/16 at 7pm
Supervised Learning
Policy
Dataset of expert trajectory
\((x, y)\)
...
\(\pi\)( ) =
Supervised Learning
Policy
Dataset
\(\mathcal D = (x_i, y_i)_{i=1}^M\)
...
\(\pi\)( ) =
Execute
Query Expert
\(\pi^*(s_0), \pi^*(s_1),...\)
\(s_0, s_1, s_2...\)
Aggregate
\((x_i = s_i, y_i = \pi^*(s_i))\)
Supervised learning guarantee
\(\mathbb E_{s\sim d^{\pi^*}_\mu}[\mathbf 1\{\widehat \pi(s) - \pi^*(s)\}]\leq \epsilon\)
Online learning guarantee
\(\mathbb E_{s\sim d^{\pi^t}_\mu}[\mathbf 1\{ \pi^t(s) - \pi^*(s)\}]\leq \epsilon\)
Performance Guarantee
\(V_\mu^{\pi^*} - V_\mu^{\widehat \pi} \leq \frac{2\epsilon}{(1-\gamma)^2}\)
Performance Guarantee
\(V_\mu^{\pi^*} - V_\mu^{\pi^t} \leq \frac{\max_{s,a}|A^{\pi^*}(s,a)|}{1-\gamma}\epsilon\)
0. Announcements & Recap
1. Inverse Reinforcement Learning
2. Maximum Entropy Principle
3. Constrained Optimization
(Kitani et al., 2012)