On Bias and Exploration
Sven Schmit, joint with Ramesh Johari, Vijay Kamble, and Carlos Riquelme
Recommendations
User preferences
User picks an item
User reports outcome (e.g. rating)
entire population
specific population
vs
Goal: find the quality of items efficiently
What happens if we ignore (private) preferences?
Do private preferences help us learn about every item?
Propose simple model
for dynamics
Finding quality
efficiently
At every time t, new user arrives
K items:
(Private) preference over
each item
Recommendation server
supplies score
Quality
User selects item
[Ifrach et al. 2013]
Horizontal differentiation
Vertical differentiation
User selects item
User reports value
Server uses this to update scores
quality
preference
error
preference
recommendation
Output K scores, observe 1 outcome
(Pseudo)-regret:
expected value best item
Optimal policy
partial feedback
[Cesa-Bianchi, Bubeck 2012]
expected value selected item
This model does not use covariates that
could explain preferences
Data is sparse:
limited information per user
Highlight issues introduced by preferences
Order items
Preferences are Bernoulli
Define gap
Relevant regime
[Marlin 2003] [Marlin et al. 2007] [Steck 2010]
Score is average of reported values
where
are the times item i was chosen
If
Under the simple Bernoulli(p) preferences model
for some
linear regret
then
a.s.
Note:
Worst arm selected too often
Gap in scores
No gap in scores
Bernoulli preferences
Debias estimates
Request different feedback
How does the item compare to your expectation?
Score is average of unbiased values
where
are the times item i was chosen
How can we incentivize myopic users?
Could users explore due to
heterogeneous preferences?
Exploration is difficult
confidence bounds & incentives
[Hummel and McAfee 2014] [Slivkins et al 2015] [Frazier et al. 2014] [Papanastasiou et al. 2014]
Bernoulli preferences (K items)
But: lower bound scores
Server takes empirical averages
Then regret bound
Notes:
sub-Gaussian
If
Biased
estimates
Unbiased
estimates
Normal preferences
But: slow start for new items
Rather than explicitly explore
highlight new items
Users are effective explorers
Private preferences lead to bad outcomes
Ask user for unbiased feedback
Private preferences lead to
free exploration
Highlight new items
Choice
Preferences
Recommendations
Feedback