Prof. Sarah Dean
MW 2:45-4pm
110 Hollister Hall
0. Announcements & Recap
1. PG with Q & A functions
2. Trust Regions & KL-Divergence
3. Natural Policy Gradient
HW2 due Monday 3/28
5789 Paper Review Assignment (weekly pace suggested)
Monday 3/21 is the last day to drop
Prelim Tuesday 3/22 at 7:30-9pm in Phillips 101
Closed-book, definition/equation sheet provided
Focus: mainly Unit 1 (known models) but many lectures in Unit 2 revisit important key concepts
Study Materials: Lecture Notes 1-15, HW0&1
Lecture on Monday 3/21 will be a review
Derivative Free Optimization: Random Search
\(\nabla J(\theta)\)\( \approx \frac{1}{2\delta} (J(\textcolor{cyan}{\theta}+{\delta v}) - J(\textcolor{cyan}{\theta}-{\delta v}))\textcolor{LimeGreen}{v}\)
\(J(\theta) = -\theta^2 - 1\)
\(\theta\)
Derivative Free Optimization: Sampling
\(\nabla J(\theta)\)\( \approx \nabla_\theta \log(P_\theta(x)) h(x) \)
\(J(\theta) = \mathbb E_{x\sim P_\theta}[h(x)]\)
\(x\)
\(= 2(\theta-x)\theta h(x)\)
\(h(x) = -x^2\)
\(=\mathbb E_{x\sim\mathcal N(\theta, 1)}[-x^2]\)
\(P_\theta = \mathcal N(\theta, 1)\)
Simple Random Search
REINFORCE
Meta-Algorithm: DF-SGA
initialize \(\theta_0\)
for \(t=0,1,...\)
0. Announcements & Recap
1. PG with Q & A functions
2. Trust Regions & KL-Divergence
3. Natural Policy Gradient