Propagating

Maximum Capacities

for Recommendation

Ahcène Boubekki

Ulf Brefeld

Cláudio L. Lucchesi

Wolfgang Stille

DIPF - Leuphana University

Leuphana University

UFMS, Brazil

ULB - TU Darmstadt

Hello everyone,

I will talk about how to propagate capacites in a graph in order to make recommendations.

This is a common work between ...

<SPACE>

Introduction

Let's build a Problem

<SPACE>


A
B
C
D
E

	A	B	C	D	E

5 items:


-	1	3	4	0
1	-	3	0	5
3	3	-	0	0
4	0	0	-	5
0	5	0	5	-

Introduction

?

Adjacency Matrix

Item Graph

Item-based Collaborative Filtering

How to recommend E?

What weight?

Let's suppose we want to build a recommender system for 5 items

The first thing to look at is the adjacency matrix

<SPACE>

The same information can be seen on the item-graph

<SPACE>

If you run an item-based CF on this graph to recommend an item from A

<SPACE>

you would be able to recommend only B,C and D

What about E?

<SPACE>

How do we do to recommend E?

And what would be its similarity with A?

The missing edges are characteristic in case of cold-start, either of the whole data set of of specific items

<SPACE>

How to weight new edges ?

Weight Propagation

Basically we want to tackle the cold-start by building a transitive closure of the graph, but how do we weight the new edges?

<SPACE>

This is Weight Propagation problem

<SPACE>

Addition

Multiplication

Weight Propagation

Consider this situation with 3 items.

The first idea would be to add the weights.

NO you don't do that. These are similarities, not distances. You can't add them.

<SPACE>

Why not multiply them?

Because if you consider new nodes always further from A,

<SPACE>

the products will be bigger

<SPACE>

and bigger.

At the end, very far away items will have a very high similarity to A and will be ranked very high.

We don't want this.

<SPACE>

.04

.008

Cosine Multiplication

Weight Propagation

Normalizing the weights, for example using cosine, does not help.

It just leads to the opposite situation,

<SPACE>

where far items are assigned a very low weight,

<SPACE>

and hence will never appear in a TOP N recommendation.

We don't want this neither.

Our solution is to be found in Network Theory. It is called the Capacity

<SPACE>

Capacity

The path may not be unique.

Weight Propagation

The capacity of a path is defined as the lowest weight of the path's edges.

In our case it 4

However

<SPACE>

their might not be a unique path connecting two nodes.

<SPACE>

Here there are three

<SPACE>

We review two points of view to handle this issue

<SPACE>

BCSP

MaxCap

Weight Propagation

The first idea is to search for a balance between capacity and path length

This is what Bi-Criterion-Shortest-Path

<SPACE>

(or BCSP) is doing. It was studied by Malucelli. It requires to compute all the paths connecting the two nodes. This is a very tedious tasks.

The approach we chose is simpler. It is called the MaxCapacity

<SPACE>

The max capacity is defined as the biggest one among all the paths connecting the two nodes.

<SPACE>

In theory this would also require to compute all the paths.

In practice this can be avoided.

<SPACE>

F. Malucelli, P. Cremonesi, and B. Rostami. An application of bicriterion shortest paths to collaborative filtering.

In Proceedings of the Federated Conference on Computer Science and Information Systems (FedCSIS), 2012.

How to compute max capacities?

Algorithms

2

1. Bucket based

2. Tree based

1. Bucket based

2. Tree based

How to compute max Capacities?

<SPACE>

We propose 2 completely different algorithms

<SPACE>

The first one makes use of buckets

<SPACE>

The second one transforms the item graph into a tree

more precisely: a forest

<SPACE>

Let's start with the bucket based one.

<SPACE>

Algorithms : Buckets

G 0

G 1

G 2

G 3

G 4

G 5

∅

G_\alpha = \lbrace (i,j) | w_{ij} > \alpha \rbrace

G_\alpha = \lbrace (i,j) | w_{ij} > \alpha \rbrace

G 4

The idea is to build a sequence of sub-graphs G_alpha

<SPACE>

containing the edges with a weight strictly bigger than alpha.

In our case this will be 6 sub graphs

<SPACE>

G_0 to G_5

<SPACE>

G_0 is a copy of the item graph,

<SPACE>

G_1 doesn't have the edge between A and B but still has all the vertices.

G_3 looses the node C

<SPACE> G_4 looses A<SPACE>

And G_5 is empty as 5 is the biggest weight.

When we update the weight of AB to set it to 3,

<SPACE>

we just add the edge to G_1 <SPACE>and G_2.

Computing the MaxCapacity between A and E

<SPACE>

consists of <SPACE> looking for the first sub graph missing at least on the the nodes.

<SPACE>

Here it is G_4, so the MaxCapacity between A and E is 4

<SPACE>

G_\alpha = \lbrace (i,j) | w_{ij} > \alpha \rbrace

G_\alpha = \lbrace (i,j) | w_{ij} > \alpha \rbrace

Algorithms

2

1. Bucket based

2. Tree based

Algorithms : Tree

 1  def Dijkstra(Graph, source):
 2
 3      create vertex set Q
 4      # Initialization
 5      for each vertex v in Graph:
 6          dist[v] ← INFINITY
 7          prev[v] ← UNDEFINED
 8          add v to Q
 9      # Distance from source to source
10      dist[source] ← 0
11      
12      while Q is not empty:
13          # Node with the least distance will be selected first
14          u ← vertex in Q with min dist[u]    
15          remove u from Q 
16          
17          # where v is still in Q.
18          for each neighbor v of u:           
19              alt ← dist[u] + length(u, v)
20              # If a shorter path to v has been found
21              if alt < dist[v]:
22                  dist[v] ← alt 
23                  prev[v] ← u 
24
25      return dist[], prev[]

https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm

cap[v] ← -INFINITY

cap[source] ← INFINITY

biggest capacity

max cap[u]

alt ← min( cap[u] , weight(u,v) )

alt > cap[v]:

cap[v] ← alt

cap[]

This Approach was developped to be able to use Dijkstra. So first we show how it can be modified to compute MaxCapacity between two nodes.

The first things, In the initialization, the capacities are set

<SPACE>

to -Inf instead of +Inf

<SPACE>

The capacity from the source to the source is set to +inf

The next node to be extended is the one that has the biggest capacity from the source instead of the least distance

<SPACE>

the capacity of the path from the source to a neighbor

<SPACE>

is the minimum of the capacity of the current node and of the weight of the link to the neighbor.

<SPACE>

If the value is bigger than the stored one, it is updated.

<SPACE>

We finish by returning capacities instead of distances

The algorithm we propose, aims at optimizing Dijkstra by building trees on-the-fly.

<SPACE>

Algorithms : Tree

AB	ED	AD	AD	EB
1	5	2

Roots

At the beginning all the nodes are the root of their own tree.

While reading the transactions there is three cases to handle, that we will review

<SPACE>

First we read AB. The trees are disconnect,

<SPACE>so we connect them

<SPACE>B is not a root anymore

<SPACE> The same for ED

The order of the root is not very important at this point

<SPACE> But later <SPACE>

it can influence the depth of the final tree <SPACE>

Now we want to update the weight of an existing edge. If the newt weight is smaller, nothing happens <SPACE>

if it is bigger, we simply modify the value <SPACE>

<SPACE> Adding EB would create a cycle <SPACE>

To handle this we compute the path between E and B, <SPACE>

and look for the edge with the smallest weight.

If the weight of the edge to be added is smaller, we stay at it is <SPACE>

if the it is bigger, <SPACE>

the new edge is added <SPACE>

and the smallest edge is removed <SPACE>

IF there is several edges with the smallest weight, one of them is removed. Again this can be optimized to keep the tree tidy.

<SPACE>

The maxCapacities can now be computed using a tree search

It is to be remembered that even if the tree can be different, the maxCapacities remain the same.

Comparison

What is the difference ?

What is the difference between the two algorithms?

<SPACE>

Let's compare them in a controlled environment

<SPACE>

Synthetic Experiments

Settings:

50k transactions

21k items

700k edges

for each transactions (u,v,w):

Update(u,v,w)

Request MaxCap(u,v)

CC = Co-occurrence

MCb = Bucket-based

Max Capacity

MCt = Tree-based

Max Capacity

The dataset is made a 50k transactions between 21k items. At the end the item graph contains up to 70k edges. The protocol consists of :

* reading a transaction

* updating the buckets or tree

* finally requesting the MaxCap between the two nodes

<SPACE>

The evolution of the number of edges is almost linear.

We compare the two algorithms to CC that just builds the adjacency matrix.

MCb refers to the bucket approach, and MCt to the tree based one

<SPACE>

The first results are memory usage. the buckets consume a lot a memory compared to the others.

Note that in a production system, the adjacency matrix is not needed to compute tha MaxCap. So we could subtract the yellow line to the two others. MCt will then been almost 0.

<SPACE>

Learning and evaluations time test.

The bucket approach is slower to learn but appears faster to evaluate the capacities. Note the scale difference. In both cases the biggest difference is about 5sec.

if the evaluation using the tree approach is slower, but it allows the usage of real numbers as weights like cosine. Buckets can't handle that as it would theoretically require an infinite number of buckets

<SPACE>

Ties

Before continuing with the experiments,

I would like to talk about an issue that we didn't evoke yet.

<SPACE>

Ties

<SPACE>

Max Capacity and Ties

	A	B	C	D	E
A	-	4	3	4	4
B	4	-	3	5	5
C	3	3	-	3	3
D	4	5	3	-	5
E	4	5	3	5	-

Max Capacity

MaxCap: Rec (A) = B D E C or B E D C or D B E C …

MaxCap + dist: Rec (A) = D E B C

!! TIES !!

	A	B	C	D	E
A	-	4	3	4	4

Tree Distance

	A	B	C	D	E
A	-	3	1	1	2

For our running example, this is the MaxCapacity Matrix.

We want to recommend items from A.

<SPACE>

The problem is that we have ties

<SPACE>

In what order should we sort items with the same MaxCapacity?

Should the recommendation be BDEC? BEDC? DBEC?

<SPACE>

We propose to handle this problem using the tree based approach

<SPACE>

We add as a second criterion the distance in the tree.

<SPACE>

Like this we reduce the number of ties, and in our example, we end up with a unique ranking.

<SPACE>

Max Capacity and Cold-start

MovieLens 1M

Simulation of a cold-start situation

Run 6 times on different parts of the dataset.

for each (user, movie, rating, date) chronologically:
 
  if movie rated between 1 and 20 times: 
    if first rating of user: 
      SKIP
    else: 
      Rec(user,movie,rating)

  if 1000 recommendations have been computed: 
    STOP

Let's go back to the experimental results and look at how the different algorithm behave in a cold-start situation simulated on the ML dataset.

<SPACE>

The cold-start situation is obtained by reading few transactions or ratings and computing recommendations of movies with few ratings (here between 1 and 20).

Ties are handled by considering that the expected item is in the middle of the tie-block

We stop after 1000 recommendations, as BCSP was too slow.

A CV is performed by running the protocol on 6 different parts of the dataset.

<SPACE>

Firstly, BCSP and MC behave similarly with an advantage for MC.

Note that the poor results of Cosine are systematic in cold-start situation

MC+dist outperforms the other baselines for both criterion. But also in term of incertitude.

<SPACE>

Max Capacity and Cold-start

Incertitude = size of the tie-blocks

We define the incertitude of a recommendation as the number of items sharing the same similarity that the expected item.

In the plot we see that using a second criterion dramatically decreases the incertitude of the recommendation.

Even though cosine uses real numbers, the average incertitude of its recommendations is higher.

<SPACE>

Summary

Summary

Described usage of weight propagation for recommender systems in cold-start situation.
Introduced and compared two Max Capacity algorithms for different use cases.
Introduced the ties problem.
Adapted the tree-based Max Capacity approach to perform well in a cold-start situation
How diverse are the recommendations?

Thank You

Double Rank : Cooc + Popularity