Recommender System

CPSC 8650 - Data Mining

Team 11

Bhargav Golla

Chris Gropp

What is Recommender System?

Making recommendations based on previous activity
Amazon Product recommendations
Netflix Challenge
Collaborative Filtering technique: Search large group of users
and filter small subset with tastes similar to a user

What are we doing?

Build a recommender system using standard libraries available
Build a recommender system using similarity functions and compare with the standard recommender built previously

WHAT ARE WE GAINING?

A clear understanding of working of a recommender system

Dataset

GroupLens' Movie Lens Datasets of sizes 100K and 1M

Chosen because it has the required attributes (user ratings) and is widely used by many research teams and hence is preprocessed

Tools Used

PredictionIO which is built on Apache Mahout, MongoDB and Hadoop
Matlab

Results with standard recommender

With 100K dataset,

Asynchronous loading - Around 6.65 minutes

Synchronous loading - Around 7 minutes

With 1M dataset,

Asynchronous loading - Around 53.5 minutes

Synchronous loading - Around 57.5 minutes

Results with Matlab recommender

With 100K Dataset, time taken for recommendations:
8 minutes

What do we have?

A Python CLI script which will ask for user to enter user ID he needs recommendations for and we show top 10 movie recommendations for that user.

A MATLAB script to generate top 10 recommendations for a user using either Euclidean Distance Similarity or Pearson Correlation Coefficient similarity function

Future Work

In Matlab Recommender:

Remove user rated items from recommendations

Test Matlab recommender on 1M dataset

In general:

Compare two recommenders on recommendations

Make our code transpose-able to allow item-based recommendation