## CS 4/5789: Introduction to Reinforcement Learning

### Lecture 24

Prof. Sarah Dean

MW 2:45-4pm

110 Hollister Hall

## Agenda

0. Announcements & Recap

1. Inverse Reinforcement Learning

2. Maximum Entropy Principle

3. Constrained Optimization

## Announcements

5789 Paper Review Assignment (weekly pace *suggested*)

HW 3 due, HW 4 released

PollEV Participation Count Updated on Canvas

My office hours are now Wednesdays 4:15-6pm in Gates 416A

Final exam Monday 5/16 at 7pm

## Recap: Behavioral Cloning

**Supervised Learning**

**Policy**

**Dataset of expert trajectory**

\((x, y)\)

...

**\(\pi\)( ) = **

## DAgger: Dataset Aggregation

**Supervised Learning**

**Policy**

**Dataset**

\(\mathcal D = (x_i, y_i)_{i=1}^M\)

...

**\(\pi\)( ) = **

**Execute**

**Query Expert**

\(\pi^*(s_0), \pi^*(s_1),...\)

\(s_0, s_1, s_2...\)

**Aggregate**

\((x_i = s_i, y_i = \pi^*(s_i))\)

## BC vs. DAgger

Supervised learning guarantee

\(\mathbb E_{s\sim d^{\pi^*}_\mu}[\mathbf 1\{\widehat \pi(s) - \pi^*(s)\}]\leq \epsilon\)

Online learning guarantee

\(\mathbb E_{s\sim d^{\pi^t}_\mu}[\mathbf 1\{ \pi^t(s) - \pi^*(s)\}]\leq \epsilon\)

Performance Guarantee

\(V_\mu^{\pi^*} - V_\mu^{\widehat \pi} \leq \frac{2\epsilon}{(1-\gamma)^2}\)

Performance Guarantee

\(V_\mu^{\pi^*} - V_\mu^{\pi^t} \leq \frac{\max_{s,a}|A^{\pi^*}(s,a)|}{1-\gamma}\epsilon\)

## Agenda

0. Announcements & Recap

1. Inverse Reinforcement Learning

2. Maximum Entropy Principle

3. Constrained Optimization

(Kitani et al., 2012)

## Example: Image Features

#### CS 4/5789: Lecture 24

By Sarah Dean