Prof. Sarah Dean
MW 2:45-4pm
255 Olin Hall
1. Recap: Multi-Armed Bandits
2. Explore-then-Commit
3. UCB Algorithm
4. UCB Analysis
A simplified setting for studying exploration
Multi-Armed Bandits
Multi-Armed Bandits
1. Multi-Armed Bandits
2. Explore-then-Commit
3. UCB Algorithm
4. UCB Analysis
Explore-then-Commit
μa∈[μ^a±cNlog(K/δ)]
Explore-then-Commit (Interactive Demo)
1. Multi-Armed Bandits
2. Explore-then-Commit
3. UCB Algorithm
4. UCB Analysis
UCB
UCB
1. Multi-Armed Bandits
2. Explore-then-Commit
3. UCB Algorithm
4. UCB Analysis
μ⋆−μat
at
a⋆
Claim: sub-optimality at t is bounded by the width of at's confidence interval
Explore-then-Commit
Upper Confidence Bound
For t=1,...,T:
Explore for N≈T2/3,
R(T)≲T2/3
R(T)≲T
Example: online advertising
Journalism
Programming
"Arms" are different job ads:
But consider different users:
CS Major
English Major
Example: online shopping
"Arms" are various products
But what about search queries, browsing history, items in cart?
Example: social media feeds
"Arms" are various posts: images, videos
Personalized to each user based on demographics, behavioral data, etc