Sarah Dean PRO
asst prof in CS at Cornell
Prof. Sarah Dean
MW 2:45-4pm
110 Hollister Hall
0. Announcements & Recap
1. Q Function Approximation
2. Optimization & Gradient Descent
3. Stochastic Gradient Descent
4. Derivative-Free Optimization
HW2 released next Monday
5789 Paper Review Assignment (weekly pace suggested)
OH cancelled today, instead Thursday 10:30-11:30am
Learning Theory Mentorship Workshop
Application due March 10: https://let-all.com/alt22.html
Prelim Tuesday 3/22 at 7:30-9pm in Phillips 101
Closed-book, definition/equation sheet for reference will be provided
Focus: mainly Unit 1 (known models) but many lectures in Unit 2 revisit important key concepts
Study Materials: Lecture Notes 1-15, HW0&1
Lecture on 3/21 will be a review
Meta-Algorithm for Policy Iteration in Unknown MDP
Supervision with Rollout (MC):
E[yi]=Qπ(si,ai)
Q via ERM on {(si,ai,yi)}1N
Rollout:
st
at∼π(st)
rt∼r(st,at)
st+1∼P(st,at)
at+1∼π(st+1)
...
Supervision with Bellman Exp (TD):
If Q=Qπ then E[yt]=Qπ(st,at)
One step:
st
at∼π(st)
rt∼r(st,at)
st+1∼P(st,at)
at+1∼π(st+1)
Supervision with Bellman Opt (TD):
If Q=Q∗ then E[yt]=Q∗(st,at)
SARSA and Q-learning are simple tabular algorithms
0. Announcements & Recap
1. Q Function Approximation
2. Optimization & Gradient Descent
3. Stochastic Gradient Descent
4. Derivative-Free Optimization
By Sarah Dean