CS 4/5789: Introduction to Reinforcement Learning
Lecture 14
Prof. Sarah Dean
MW 2:45-4pm
110 Hollister Hall
Should I mask during lecture?
PollEv.com/sarahdean011
Agenda
0. Announcements & Recap
1. Derivative-Free Optimization
2. Simple Random Search
3. PG with Trajectories
4. PG with Q & A functions
Announcements
HW2 released tonight, due 3/28
Suggestion: start written portion before prelim
5789 Paper Review Assignment (weekly pace suggested)
Should I mask during lecture? PollEv.com/sarahdean011
Prelim Tuesday 3/22 at 7:30-9pm in Phillips 101
Closed-book, definition/equation sheet for reference will be provided
Focus: mainly Unit 1 (known models) but many lectures in Unit 2 revisit important key concepts
Study Materials: Lecture Notes 1-15, HW0&1
Lecture on Monday 3/21 will be a review
Prelim Exam
Recap
- Gradient Ascent
\(\theta_{t+1} = \theta_t + \alpha \nabla J(\theta_t)\) - Stochastic Gradient Ascent
\(\theta_{t+1} = \theta_t + \alpha g_t\) where \(\mathbb E[g_t] = \nabla J(\theta_t)\) - SGD with sampling in risk minimization: \(\displaystyle \min_\theta \underbrace{\mathbb E[\ell(f_\theta(x),y)]}_{ \mathcal R(\theta) }\)
\(g_t = \nabla \ell(f_\theta(x_i),y_i) \) where \(x_i,y_i\) sampled i.i.d., \(\mathbb E[g_t ] = \nabla \mathcal R(\theta)\) - No gradients without models :(
Not knowing transition \(P(s,a)\) is like not knowing the (whole) loss function!
CS 4/5789: Lecture 14
By Sarah Dean