Noah Ó Donnaile | November 2018
https://slides.com/noahodonnaile/recommender-systems
Recommender systems use the opinions of a community of users to help individuals in that community more effectively identify content of interest from a potentially overwhelming set of choices
Resnick and Varian, 1997
Recommender systems are information filtering systems that deal with the problem of information overload by filtering vital information fragment out of large amount of dynamically generated information according to user’s preferences, interest, or observed behavior about item
F.O. Isinkaye et al., 2015
What does Rabble need to recommend?
In order of importance, in my opinion:
Memory-based algorithms approach the collaborative filtering problem by using the entire database. A memory-based system tries to find users that are similar to the active user (i.e. the users we want to make predictions for), and uses their preferences to predict ratings for the active user.
Edited from: Computer Science Comprehensive Exercise, Carleton College
There are a good number of different algorithms used in memory-based collaborative filtering to calculate the similarity between users.
Eg. Euclidean Distance, Pearsson coefficient, Cosine-based vector similarity, k-Nearest Neighbours, Jaccard similarity coefficient.
Model-based recommendation systems involve building a model based on the dataset of ratings.
We extract some information from the dataset, and use that as a "model" to make recommendations (without having to use the complete dataset every time).
Probability problem: The problem of predicting a rating for a user-item pair is seen as the problem of predicting the probability of the rating being a particular value. See: Bayesian networks, clustering.
Linear algebra: Consider the matrices of users and ratings available to us and perform linear algebra operations on them. Eg. Singular Value Decomposition.
Content-Based recommender systems are born from the idea of using the content of each item for recommending purposes, and trying to solve the problems with collaborative filtering (eg. cold start, dealing with sparcity, transparency, etc.).
In a system tailored for text, a bag-of-words method (eg. TF-IDF) can be used to find terms that are common between two items but rare across the dataset as a whole. This implies these documents are related.
A recommender system should recommend things people would not find otherwise; ie. incremental sales. Content-based systems often just find extremely similar items that the user already knew about, or is not interested in.
"If a customer is looking at the product details page for Harry Potter and the Chamber of Secrets, and your recommender shows Prisoner of Azkaban, and the customer buys it, the data scientists back at Random House HQ should not be high-fiving.
It's a safe bet that that customer already knew there were more than two books in the series and would have bought Prisoner of Azkaban anyway"
Content-based recommenders are not good at capturing inter-dependencies or complex behaviours.
Eg. I like articles on ML only when they include practical application along with the theory, and not just theory. A content-based recommender will miss this.
Over-specialisation
A content-based filtering system will not select items if the previous user behaviour does not provide evidence for this. Additional techniques have to be added to give the system the capability to make suggestion outside the scope of what the user has already shown interest in.
Related: Creation of an echo-chamber.
Online methods (ie. asking users how useful a recommendation was) are infeasible. Offline methods (run on historical data) must be used.
Accuracy refers to how much predicted ratings differ from real ratings. Eg: mean absolute error (MAE), mean squared error (MSE), RMSE, etc.
Decision-support metrics judge how relevant a ranked set of recommendations is for a user. Eg: precision, recall, F1, Breese score/weighted recall, etc.