Recommender System
CPSC 8650 - Data Mining
Team 11
Bhargav Golla
Chris Gropp
What is Recommender System?
-
Making recommendations based on previous activity
- Amazon Product recommendations
- Netflix Challenge
- Collaborative Filtering technique: Search large group of users
and filter small subset with tastes similar to a user
What are we doing?
- Build a recommender system using standard libraries available
- Build a recommender system using similarity functions and compare with the standard recommender built previously
WHAT ARE WE GAINING?
A clear understanding of working of a recommender system
Dataset
GroupLens' Movie Lens Datasets of sizes 100K and 1M
Chosen because it has the required attributes (user ratings) and is widely used by many research teams and hence is preprocessed
Tools Used
- PredictionIO which is built on Apache Mahout, MongoDB and Hadoop
- Matlab
Results with standard recommender
With 100K dataset,
Asynchronous loading - Around 6.65 minutes
Synchronous loading - Around 7 minutes
With 1M dataset,
Asynchronous loading - Around 53.5 minutes
Synchronous loading - Around 57.5 minutes
Results with Matlab recommender
With 100K Dataset, time taken for recommendations:
8 minutes
What do we have?
A Python CLI script which will ask for user to enter user ID he needs recommendations for and we show top 10 movie recommendations for that user.
A MATLAB script to generate top 10 recommendations for a user using either Euclidean Distance Similarity or Pearson Correlation Coefficient similarity function
Future Work
In Matlab Recommender:
Remove user rated items from recommendations
Test Matlab recommender on 1M dataset
In general:
Compare two recommenders on recommendations
Make our code transpose-able to allow item-based recommendation