Item-Based Collaborative Filtering Recommendation Algorithms

matthew sun · mdsun@princeton.edu

citp recsys reading group · 1/12/21

📸 big picture

state-of-the-art for recommendation was user-based k-nearest neighbors (KNN) approaches
- main drawback: issues at scale
main thrust of paper: item-based recommendation algorithms yield as good/better error rates at much higher throughput
- 💡 I was intrigued to see the emphasis on performance; many modern machine learning papers focus on achieving SOTA on a particular accuracy metric

📃 introduction

"collaborative filtering"
- works by building a database of preferences for items by users
- new user Neo matched against database to discover neighbors
- items liked by Neo's neighbors are recommended to Neo
two main challenges: scalability & quality
- search tens of millions of neighbors; also, there may be lots of information about every individual user

📃 introduction (cont)

item-based algorithms
- explore relationships between items rather than relationships between users
- avoid bottleneck of having to search database of users for similar users
  - cold start problem?
"because relationships between items are relatively static, item-based algorithms may be able to provide the same quality as user-based algorithms"
- assumes that the measure of item similarity accurately captures something about user preferences

🌐 overview of collaborative filtering

❗ interesting tidbit: this paper (like many older RS papers) assumes the setting of e-commerce
collaborative filtering overview:

\mathcal{U} = {u_1, u_2,\cdots,u_m} \\ \mathcal{I} = {i_1, i_2,\cdots, i_n}

u_i

I_{u_i} = \{\cdots\}

(items the user has "expressed opinions about" - i.e., rated, purchased, liked, etc.)

🌐 overview of collaborative filtering

two distinct tasks
- prediction
  - predicted "opinion value" of user for item
- recommendation
  - list of N items that the active user will like the most (also known as Top-N recommendation)
memory (user-based) vs model (item-based)
- item-based: develop model of user ratings, conditional on interaction with other items

i_j

u_i

🎁 item-based methods

problem set up: target user and target item
- select k most similar items:
- compute similarity of each item to the target item

u_a

\{i_1, i_2,\cdots, i_k\} \subseteq I_{u_a}

\{s_{i_1}, s_{i_2},\cdots, s_{i_k}\}

🎁 item-based methods

problem set up: target user and target item
- select k most similar items:
- compute similarity of each item to the target item
- produce prediction by computing weighted average of target user's ratings on similar items

u_a

\{i_1, i_2,\cdots, i_k\} \subseteq I_{u_a}

\{s_{i_1}, s_{i_2},\cdots, s_{i_k}\}

PREDICTION COMPUTATION

🎁 item-based methods

problem set up: target user and target item
- select k most similar items:
- compute similarity of each item to the target item
- produce prediction by computing weighted average of target user's ratings on similar items

u_a

\{i_1, i_2,\cdots, i_k\} \subseteq I_{u_a}

\{s_{i_1}, s_{i_2},\cdots, s_{i_k}\}

\hat{R_{i}}

PREDICTION COMPUTATION

🎁 item-based methods

problem set up: target user and target item
- select k most similar items:
- compute similarity of each item to the target item
- produce prediction by computing weighted average of target user's ratings on similar items

u_a

\hat{R_{i}}

PREDICTION COMPUTATION

🔩 types of similarity computation

cosine similarity

correlation-based similarity

adjusted cosine similarity

\text{cos}(\vec{i},\vec{j})= \frac{\vec{i}\cdot\vec{j}}{\|\vec{i}\|_2 \|\vec{j}\|_2}

\frac{\sum_{u\in U}(R_{u,i}-\bar{R_i})(R_{u,j}-\bar{R_j})}{\sqrt{\sum_{u\in U}(R_{u,i}-\bar{R_i})^2} \sqrt{\sum_{u\in U}(R_{u,j}-\bar{R_j})^2}}

(Pearson correlation)

defined as set of all users who rated both items

\frac{\sum_{u\in U}(R_{u,i}-\bar{R_u})(R_{u,j}-\bar{R_u})}{\sqrt{\sum_{u\in U}(R_{u,i}-\bar{R_u})^2} \sqrt{\sum_{u\in U}(R_{u,j}-\bar{R_u})^2}}

(basically, cosine similarity, but "standardize" each user by subtracting their mean rating)

🧪 prediction computation

weighted sum

adjusted cosine similarity

I_S:

set of all items determined to be similar to target item

P_{u,i}=\frac{\sum_{j\in I_S}(s_{i,j}\cdot R_{u,j})}{\sum_{j\in I_S}|s_{i,j}|}

P_{u,i}=\frac{\sum_{j\in I_S}(s_{i,j}\cdot R'_{u,j})}{\sum_{j\in I_S}|s_{i,j}|}

R'_N=\alpha R_i + \beta + \epsilon

📊 experimental evaluation

Pre-compute similarity between items
- For each item, compute most similar items, where ( = model size)
- Might this suffer from issues of staleness?
Data set: MovieLens
- 43k users, 3.5k movies
- Only consider users with 20+ movie ratings
  - Might this bias the evaluation?
- Multiple different train/test splits
Evaluation metric: MAE

k << n

\frac{\sum_{i=1}^N |p_i-q_i|}{N}

prediction/rating pair

p_i,q_i

📊 experimental evaluation

10-fold cross validation for every train/test split
Compare item-based to Pearson nearest neighbor algorithm
Performed comparisons of similarity computation, training/test ratio, and neighborhood size
- Adjusted cosine similarity resulted in lowest MAE
- Selected 80/20 training/test split
  - Compared weighted sum to regression based approach
- Select 30 as optimal neighborhood size

📊 experimental evaluation (cont.)

user-user vs item-item at various neighborhood sizes (80/20 training/test split)

📊 experimental evaluation (cont.)

Throughput decreases as training set size increases
- Recommendation time may be misleading (at smaller x, recommendations are made on more test cases)

💬 discussion

item-item provides better quality predictions ("however, the improvement is not significantly large")
- as measured by MAE
  - when do differences in MAE stop providing useful signal?
- only for users with 20+ movie ratings
"...the item neighborhood is fairly static...which results in very high online performance"
- no runtime comparison to user-user system

Item-Based Collaborative Filtering Recommendation Algorithms

By Matthew Sun

Item-Based Collaborative Filtering Recommendation Algorithms