Prof. Sarah Dean
MW 2:45-4pm
110 Hollister Hall
0. Announcements & Recap
1. Linear Contextual Bandits
2. Interactive Demo
3. LinUCB Algorithm
My office hours today are cancelled
Prelim corrections due tomorrow - please list collaborators
5789 Paper Review Assignment (weekly pace suggested)
HW 3 released tonight, due in 2 weeks
Final exam Monday 5/16 at 7pm
A simplified setting for studying exploration
Explore-then-Commit
Upper Confidence Bound
For \(t=1,...,T\):
Set exploration \(N \approx T^{2/3}\),
\(R(T) \lesssim T^{2/3}\)
\(R(T) \lesssim \sqrt{T}\)
A (less) simplified setting for studying exploration
ex - machine make an model affect rewards, so context \(x=(\)•\(, \)•\(, \)•\(, \)•\(, \)•\(, \)•\(, \)•\(, \)•\()\)
Explore-then-Commit
Set exploration \(N \approx T^{2/3}\),
we showed \(R(T) \lesssim T^{2/3}\) using prediction error guarantees \(\mathbb E_{x\sim \mathcal D}[|\widehat \mu_a(x) - \mu_a(x)|]\)
Set exploration \(N \approx T^{2/3}\),
we showed \(R(T) \lesssim T^{2/3}\) using prediction error guarantees \(\mathbb E_{x\sim \mathcal D}[|\widehat \mu_a(x) - \mu_a(x)|]\)
For context-dependent confidence bounds, we need to understand
\(\mathbb E[|\widehat \mu_a(x) - \mu_a(x)|\mid x]\)
0. Announcements & Recap
1. Linear Contextual Bandits
2. Interactive Demo
3. LinUCB Algorithm