## CS 4/5789: Introduction to Reinforcement Learning

### Lecture 8

Prof. Sarah Dean

MW 2:45-4pm

110 Hollister Hall

## Agenda

0. Announcements & Recap

1. PID Control

2. Limitations in Action

3. Limitations in Observation

4. Model Mis-specification

## Announcements

Participation grades: Attendance and/or Ed Discussions

HW Deadlines and Extensions

HW1 released Friday (early for Feb break), due 3/7

Office hours after lecture

## Recap

**Local Approximations**

**Local Control**

## iLQR

initialize \(\bar a_0^0,\dots \bar a_{H-1}^0\) and \(\bar s_0^0\sim \mu_0\)

generate nominal trajectory \(\tau_0 = \{(\bar s_t^0, \bar a_t^0)\}_{t=0}^{H-1}\) by \(\bar s^0_{t+1} =f(\bar s_t^0, \bar a_t^0)\)

for \(i=0,1,\dots\):

\(\{A_t, B_t, v_t, Q_t, R_t, q_t, r_t, c_t\}_{t=0}^{H-1}=\)Approx\((f, c, \tau_i)\)

\(\{K^\star_t, k^\star_t\}_{t=0}^{H-1}=\)LQR\((\{A_t, B_t, v_t, Q_t, R_t, q_t, r_t, c_t\}_{t=0}^{H-1})\)

generate \(\tau_{i+1} = \{(\bar s_t^{i+1}, \bar a_t^{i+1})\}_{t=0}^{H-1}\) by \(\bar s_{t+1}^{i+1} = f(\bar s_{t}^{i+1},\underbrace{ K^\star_t\bar s_{t}^{i+1} + k^\star_t}_{\bar a_t^{i+1}})\)

Linearize *around* a trajectory. What trajectory? Iterate!

#### CS 4/5789: Lecture 8

By Sarah Dean