Prof. Sarah Dean
MW 2:45-4pm
255 Olin Hall
1. Recap Units 1&2
2. Motivation and Demo
3. Multi-Armed Bandit Setting
4. Exploration & Exploitation
In Unit 2, discussed algorithms for:
action \(a_t\)
state \(s_t\)
reward \(r_t\)
policy
data \((s_t,a_t,r_t)\)
policy \(\pi\)
transitions \(P,f\)
experience
unknown in Unit 2
In Unit 2, discussed algorithms for:
1. Recap Units 1&2
2. Motivation and Demo
3. Multi-Armed Bandit Setting
4. Exploration & Exploitation
Example: mountainCar rewarded only at flag
A simplified setting for studying exploration
Multi-Armed Bandits
Online advertising
NYT Caption Contest
Medical Trials
Interactive Coding Demo and PollEV
1. Recap Units 1&2
2. Motivation and Demo
3. Multi-Armed Bandit Setting
4. Exploration & Exploitation
1. Recap Units 1&2
2. Motivation and Demo
3. Multi-Armed Bandit Setting
4. Exploration & Exploitation
Uniform
Greedy
Uniform \(a_t\sim \mathrm{Unif}(K)\)
Greedy \(a_t=\arg\max_{a\in[K]} r_a\)
Explore-then-Commit
\( \mu_{a} \in\left[ \hat \mu_{a} \pm c\sqrt{\frac{\log(K/\delta)}{N}}\right]\)
Explore-then-Commit