CS 4/5789: Introduction to Reinforcement Learning

Lecture 8

Prof. Sarah Dean

MW 2:45-4pm
110 Hollister Hall

Agenda

 

0. Announcements & Recap

1. PID Control

2. Limitations in Action

3. Limitations in Observation

4. Model Mis-specification

Announcements

 

Participation grades: Attendance and/or Ed Discussions

 

HW Deadlines and Extensions

 

HW1 released Friday (early for Feb break), due 3/7

 

Office hours after lecture

Recap

Local Approximations

Local Control

iLQR

initialize \(\bar a_0^0,\dots \bar a_{H-1}^0\) and \(\bar s_0^0\sim \mu_0\)

generate nominal trajectory \(\tau_0 = \{(\bar s_t^0, \bar a_t^0)\}_{t=0}^{H-1}\) by \(\bar s^0_{t+1} =f(\bar s_t^0, \bar a_t^0)\)

for \(i=0,1,\dots\):

        \(\{A_t, B_t, v_t, Q_t, R_t, q_t, r_t, c_t\}_{t=0}^{H-1}=\)Approx\((f, c, \tau_i)\)

        \(\{K^\star_t, k^\star_t\}_{t=0}^{H-1}=\)LQR\((\{A_t, B_t, v_t, Q_t, R_t, q_t, r_t, c_t\}_{t=0}^{H-1})\)

        generate \(\tau_{i+1} = \{(\bar s_t^{i+1}, \bar a_t^{i+1})\}_{t=0}^{H-1}\) by \(\bar s_{t+1}^{i+1} = f(\bar s_{t}^{i+1},\underbrace{ K^\star_t\bar s_{t}^{i+1} + k^\star_t}_{\bar a_t^{i+1}})\)

Linearize around a trajectory. What trajectory? Iterate!