## CS 4/5789: Introduction to Reinforcement Learning

### Lecture 14

Prof. Sarah Dean

MW 2:45-4pm
110 Hollister Hall

## Agenda

0. Announcements & Recap

1. Derivative-Free Optimization

2. Simple Random Search

3. PG with Trajectories

4. PG with Q & A functions

## Announcements

HW2 released tonight, due 3/28
Suggestion: start written portion before prelim

5789 Paper Review Assignment (weekly pace suggested)

Should I mask during lecture? PollEv.com/sarahdean011

Prelim Tuesday 3/22 at 7:30-9pm in Phillips 101

Closed-book, definition/equation sheet for reference will be provided

Focus: mainly Unit 1 (known models) but many lectures in Unit 2 revisit important key concepts

Study Materials: Lecture Notes 1-15, HW0&1

Lecture on Monday 3/21 will be a review

## Recap

$$\theta_{t+1} = \theta_t + \alpha \nabla J(\theta_t)$$
$$\theta_{t+1} = \theta_t + \alpha g_t$$ where $$\mathbb E[g_t] = \nabla J(\theta_t)$$
• SGD with sampling in risk minimization: $$\displaystyle \min_\theta \underbrace{\mathbb E[\ell(f_\theta(x),y)]}_{ \mathcal R(\theta) }$$
$$g_t = \nabla \ell(f_\theta(x_i),y_i)$$ where $$x_i,y_i$$ sampled i.i.d., $$\mathbb E[g_t ] = \nabla \mathcal R(\theta)$$
Not knowing transition $$P(s,a)$$ is like not knowing the (whole) loss function!