Avraam Mavridis
@avraamakis
- Use historical data on purchasing behavior
- Find similar users
- Predict users' interest based on a similarity metric
User Interaction Matrix
Let matrix V be the product of the matrices W and H,
V = WH.
Matrix multiplication can be implemented as computing the column vectors of V as linear combinations of the column vectors in W using coefficients supplied by columns of H.
The goal of the recommendation system is to predict the blanks in the utility matrix
- We can ask users to rate items (but users usually are unwilling to provide responses)
- We can make inferences from users' behavior
Inception | Interstellar | Memento | Insomnia | Matrix | Quay | Prestige | |
---|---|---|---|---|---|---|---|
Bob | 4 | 5 | 1 | ||||
Alice | 5 | 5 | 4 | ||||
Dinos | 2 | 4 | 5 | ||||
Nick | 3 | 3 |
Jaccard Index
Jaccard Index
Inception | Interstellar | Memento | Insomnia | Matrix | Quay | Prestige | |
---|---|---|---|---|---|---|---|
Bob | 4 | 5 | 1 | ||||
Alice | 5 | 5 | 4 | ||||
Dinos | 2 | 4 | 5 | ||||
Nick | 3 | 3 |
Bob and Alice have intersection size 1 and union size 5, their Jaccard distance is 1/5 = 0.2.
In comparison, Bob and Dinos have intersection size 2 and union size 4, their Jaccard distance is 2/4 = 0.5
Thus Bob appears close to Dinos than Alice...
Euclidean / Manhatan / Minkowski
Euclidean distance
Euclidean distance
def euclidean_distance(moviesUser1, moviesUser2, threshold = 3):
movies1 = map(lambda x: x['movieId'], moviesUser1)
movies2 = map(lambda x: x['movieId'], moviesUser2)
intersection = [val for val in movies1 if val in movies2]
sum = 0
for key in intersection:
match1 = filter(lambda x: x['movieId'] == key, moviesUser1)
match2 = filter(lambda x: x['movieId'] == key, moviesUser2)
sum += math.sqrt(pow(float(match1[0]['rating']) - float(match2[0]['rating']), 2))
return 1 / (1+sum)
Euclidean distance
999,1953,4.0
999,2105,4.0
999,2150,3.0
999,2193,2.0
999,2294,2.0
999,2455,2.5
999,3671,3.0
999,2968,1.0
Euclidean distance
999,1953,4.0
999,2105,4.0
999,2150,3.0
999,2193,2.0
999,2294,2.0
999,2455,2.5
999,3671,3.0
999,2968,3.0
Cosine Distance
Cosine Similarity
Cosine similarity is the cosine of the angle between two n-dimensional vectors in an n-dimensional space. It is the dot product of the two vectors divided by the product of the two vectors' lengths (or magnitudes)
Cosine Similarity
Inception | Interstellar | Memento | Insomnia | Matrix | Quay | Prestige | |
---|---|---|---|---|---|---|---|
Bob | 4 | 5 | 1 | ||||
Alice | 5 | 5 | 4 | ||||
Dinos | 2 | 4 | 5 | ||||
Nick | 3 | 3 |
Bob Vs Dinos
= 0.322
Bob vs Alice
= 0.38
Cosine Similarity
Pearson correlation coefficient
The Pearson correlation coefficient is a measure of the linear correlation between two variables. It gives a number between -1 and 1, where -1 is a total negative linear correlation, a value of 0 means no linear correlation and a value of 1 is a total positive linear correlation.
- Use metadata of the items
- Use a classifier
- Recommend item to the user based on the class of items he is interested
TFIDF (term frequency–inverse document frequency)
PROBLEMS
- Data and metadata about items required, not always available
- It doesnt share information across users
NEURAL NETWORK MODEL
NEURAL NETWORK MODEL
SUMMARY
- Understand your domain
- Pay attention to metadata
- Mix various models
- Evaluate the results (tricky)
Ethical Considerations
Thank you