Human Interaction with Recommendation Systems:

On Bias and Exploration

Sven Schmit, joint with Ramesh Johari, Vijay Kamble, and Carlos Riquelme

Recommendations

User preferences

User picks an item

User reports outcome (e.g. rating)

entire population

specific population

vs

Example

Goal: find the quality of items efficiently

Bias

What happens if we ignore (private) preferences?

Exploration

Do private preferences help us learn about every item?

Model

Propose simple model

for dynamics

Outline

Finding quality

efficiently

Model

At every time t, new user arrives

K items:

i=1,2,\ldots,K

i=1,2,\ldots,K

(Private) preference over

each item

\theta_{it} \sim F_i

\theta_{it} \sim F_i

Recommendation server

supplies score

q_{it}

q_{it}

Quality

Q_i

Q_i

User selects item

a_t = \arg\max_i q_{it} + \theta_{it}

a_t = \arg\max_i q_{it} + \theta_{it}

[Ifrach et al. 2013]

Horizontal differentiation

Vertical differentiation

Model ctd.

User selects item

a_t = \arg\max_i q_{it} + \theta_{it}

a_t = \arg\max_i q_{it} + \theta_{it}

User reports value

V_t(a_t) = Q_{a_t t} + \theta_{a_t t} + \epsilon_t

V_t(a_t) = Q_{a_t t} + \theta_{a_t t} + \epsilon_t

Server uses this to update scores

\vec{q}_{t+1} = f(V_1, V_2, \ldots, V_t)

\vec{q}_{t+1} = f(V_1, V_2, \ldots, V_t)

quality

preference

error

preference

recommendation

Performance metric

Output K scores, observe 1 outcome

(Pseudo)-regret:

R_T = \sum_{t=1}^T \max_i (Q_i + \theta_{it}) - (Q_{a_t} + \theta_{a_t t})

R_T = \sum_{t=1}^T \max_i (Q_i + \theta_{it}) - (Q_{a_t} + \theta_{a_t t})

expected value best item

Optimal policy

q_{it} \equiv Q_i

q_{it} \equiv Q_i

partial feedback

[Cesa-Bianchi, Bubeck 2012]

expected value selected item

Comment

This model does not use covariates that

could explain preferences

Data is sparse:

limited information per user

Highlight issues introduced by preferences

Analysis simple model

Order items

Q_1 > Q_2 > \ldots > Q_K

Q_1 > Q_2 > \ldots > Q_K

Preferences are Bernoulli

\theta_{it} \sim \text{Bernoulli}(p)

\theta_{it} \sim \text{Bernoulli}(p)

\Delta = Q_1 - Q_2

\Delta = Q_1 - Q_2

Define gap

Relevant regime

p \lesssim \frac{\log(K)}{2K}

p \lesssim \frac{\log(K)}{2K}

Bias

[Marlin 2003] [Marlin et al. 2007] [Steck 2010]

Naive recommendation server

Score is average of reported values

S_{it} = \{\tau < t : a_\tau = i \}

S_{it} = \{\tau < t : a_\tau = i \}

q_{it} = \frac{1}{|S_{it}|} \sum_{\tau \in S_{it}} V_\tau

q_{it} = \frac{1}{|S_{it}|} \sum_{\tau \in S_{it}} V_\tau

where

are the times item i was chosen

Linear regret almost surely

If

\Delta < \Delta^*(K, p)

\Delta < \Delta^*(K, p)

Under the simple Bernoulli(p) preferences model

\limsup_t \frac{R_t}{t} \ge c

\limsup_t \frac{R_t}{t} \ge c

for some

c > 0

c > 0

linear regret

then

a.s.

Note:

\Delta^*(2, 0.5) = \frac{1}{3}

\Delta^*(2, 0.5) = \frac{1}{3}

Simulations

\Delta = 0.5

\Delta = 0.5

\Delta = 0.2

\Delta = 0.2

Worst arm selected too often

Gap in scores

No gap in scores

Bernoulli preferences

p = \frac{1}{2}

p = \frac{1}{2}

Solution

Debias estimates

Computationally difficult
Distributional assumptions
Not robust

Request different feedback

Ask user for debiased estimates
Computationally easy
No assumptions
Feasible in practice?

How does the item compare to your expectation?

W_t = V_t - (q_{it} + \theta_{it})

W_t = V_t - (q_{it} + \theta_{it})

Better

recommendations

Score is average of unbiased values

S_{it} = \{\tau < t : a_\tau = i \}

S_{it} = \{\tau < t : a_\tau = i \}

q'_{it} = \frac{1}{|S_{it}|} \sum_{\tau \in S_{it}} (W_\tau + q'_{i\tau})

q'_{it} = \frac{1}{|S_{it}|} \sum_{\tau \in S_{it}} (W_\tau + q'_{i\tau})

where

are the times item i was chosen

Exploration

Motivation

How can we incentivize myopic users?

Could users explore due to

heterogeneous preferences?

Exploration is difficult

confidence bounds & incentives

[Hummel and McAfee 2014] [Slivkins et al 2015] [Frazier et al. 2014] [Papanastasiou et al. 2014]

Assumptions

Bernoulli preferences (K items)

But: lower bound scores

\max_i q'_{it} - \min_i q'_{it} < 1

\max_i q'_{it} - \min_i q'_{it} < 1

Server takes empirical averages

q'_{it}

q'_{it}

Regret bound

\epsilon_t

\epsilon_t

Then regret bound

E(R_t) \le CK\log(t) + C'K

E(R_t) \le CK\log(t) + C'K

Notes:

problem dependent bound
constants depend on relation p and K

sub-Gaussian

If

Biased

estimates

Unbiased

estimates

Regret simulations

Normal preferences

\Delta = 0.4

\Delta = 0.4

Take-away

But: slow start for new items

Rather than explicitly explore

highlight new items

Users are effective explorers

Wrapping up

Bias

Private preferences lead to bad outcomes

Ask user for unbiased feedback

Exploration

Private preferences lead to

free exploration

Highlight new items

Model

Choice

Preferences

Recommendations

Feedback