## CS 4/5789: Introduction to Reinforcement Learning

### Lecture 8

Prof. Sarah Dean

MW 2:45-4pm
110 Hollister Hall

## Agenda

0. Announcements & Recap

1. PID Control

2. Limitations in Action

3. Limitations in Observation

4. Model Mis-specification

## Announcements

Participation grades: Attendance and/or Ed Discussions

HW1 released Friday (early for Feb break), due 3/7

Office hours after lecture

## Recap

Local Approximations

Local Control

## iLQR

initialize $$\bar a_0^0,\dots \bar a_{H-1}^0$$ and $$\bar s_0^0\sim \mu_0$$

generate nominal trajectory $$\tau_0 = \{(\bar s_t^0, \bar a_t^0)\}_{t=0}^{H-1}$$ by $$\bar s^0_{t+1} =f(\bar s_t^0, \bar a_t^0)$$

for $$i=0,1,\dots$$:

$$\{A_t, B_t, v_t, Q_t, R_t, q_t, r_t, c_t\}_{t=0}^{H-1}=$$Approx$$(f, c, \tau_i)$$

$$\{K^\star_t, k^\star_t\}_{t=0}^{H-1}=$$LQR$$(\{A_t, B_t, v_t, Q_t, R_t, q_t, r_t, c_t\}_{t=0}^{H-1})$$

generate $$\tau_{i+1} = \{(\bar s_t^{i+1}, \bar a_t^{i+1})\}_{t=0}^{H-1}$$ by $$\bar s_{t+1}^{i+1} = f(\bar s_{t}^{i+1},\underbrace{ K^\star_t\bar s_{t}^{i+1} + k^\star_t}_{\bar a_t^{i+1}})$$

Linearize around a trajectory. What trajectory? Iterate!

By Sarah Dean

Private