Human Interaction with Recommendation Systems:
On Bias and Exploration
Sven Schmit, joint with Ramesh Johari, Vijay Kamble, and Carlos Riquelme
Recommendations
User preferences
User picks an item
User reports outcome (e.g. rating)
entire population
specific population
vs
Example
Goal: find the quality of items efficiently
Bias
What happens if we ignore (private) preferences?
Exploration
Do private preferences help us learn about every item?
Model
Propose simple model
for dynamics
Outline
Finding quality
efficiently
Model
At every time t, new user arrives
K items:
(Private) preference over
each item
Recommendation server
supplies score
Quality
User selects item
[Ifrach et al. 2013]
Horizontal differentiation
Vertical differentiation
Model ctd.
User selects item
User reports value
Server uses this to update scores
quality
preference
error
preference
recommendation
Performance metric
Output K scores, observe 1 outcome
(Pseudo)-regret:
expected value best item
Optimal policy
partial feedback
[Cesa-Bianchi, Bubeck 2012]
expected value selected item
Comment
This model does not use covariates that
could explain preferences
Data is sparse:
limited information per user
Highlight issues introduced by preferences
Analysis simple model
Order items
Preferences are Bernoulli
Define gap
Relevant regime
Bias
[Marlin 2003] [Marlin et al. 2007] [Steck 2010]
Naive recommendation server
Score is average of reported values
where
are the times item i was chosen
Linear regret almost surely
If
Under the simple Bernoulli(p) preferences model
for some
linear regret
then
a.s.
Note:
Simulations
Worst arm selected too often
Gap in scores
No gap in scores
Bernoulli preferences
Solution
Debias estimates
- Computationally difficult
- Distributional assumptions
- Not robust
Request different feedback
- Ask user for debiased estimates
- Computationally easy
- No assumptions
- Feasible in practice?
How does the item compare to your expectation?
Better
recommendations
Score is average of unbiased values
where
are the times item i was chosen
Exploration
Motivation
How can we incentivize myopic users?
Could users explore due to
heterogeneous preferences?
Exploration is difficult
confidence bounds & incentives
[Hummel and McAfee 2014] [Slivkins et al 2015] [Frazier et al. 2014] [Papanastasiou et al. 2014]
Assumptions
Bernoulli preferences (K items)
But: lower bound scores
Server takes empirical averages
Regret bound
Then regret bound
Notes:
- problem dependent bound
- constants depend on relation p and K
sub-Gaussian
If
Biased
estimates
Unbiased
estimates
Regret simulations
Normal preferences
Take-away
But: slow start for new items
Rather than explicitly explore
highlight new items
Users are effective explorers
Wrapping up
Bias
Private preferences lead to bad outcomes
Ask user for unbiased feedback
Exploration
Private preferences lead to
free exploration
Highlight new items
Model
Choice
Preferences
Recommendations
Feedback
Human interaction with Recommendation Systems
By Sven
Human interaction with Recommendation Systems
INFORMS 2016 talk
- 1,188