Sarah Dean PRO
asst prof in CS at Cornell
Prof. Sarah Dean
MW 2:55-4:10pm
255 Olin Hall
1. Recap Unit 2
2. Motivation and Demo
3. Multi-Armed Bandit Setting
4. Exploration & Exploitation
5. Explore-then-Commit
In Unit 2, discussed algorithms for:
action \(a_t\)
state \(s_t\)
reward \(r_t\)
policy
data \((s_t,a_t,r_t)\)
policy \(\pi\)
transitions \(P,f\)
experience
unknown in Unit 2
In Unit 2, discussed algorithms for:
1. Recap Units 1&2
2. Motivation and Demo
3. Multi-Armed Bandit Setting
4. Exploration & Exploitation
5. Explore-then-Commit
Example: mountainCar rewarded only at flag
A simplified setting for studying exploration
Multi-Armed Bandits
Online advertising
NYT Caption Contest
Medical Trials
Interactive Coding Demo and PollEV
1. Recap Units 1&2
2. Motivation and Demo
3. Multi-Armed Bandit Setting
4. Exploration & Exploitation
5. Explore-then-Commit
1. Recap Units 1&2
2. Motivation and Demo
3. Multi-Armed Bandit Setting
4. Exploration & Exploitation
5. Explore-then-Commit
Uniform
Greedy
Uniform \(a_t\sim \mathrm{Unif}(K)\)
Greedy \(a_t=\arg\max_{a\in[K]} r_a\)
1. Recap Units 1&2
2. Motivation and Demo
3. Multi-Armed Bandit Setting
4. Exploration & Exploitation
5. Explore-then-Commit
Explore-then-Commit
\( \mu_{a} \in\left[ \hat \mu_{a} \pm c\sqrt{\frac{\log(K/\delta)}{N}}\right]\)
Explore-then-Commit
By Sarah Dean