Neural Collaborative Filtering
Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, Tat-Seng Chua
National University of Singapore
WWW'17
Motivation (1/2)
- The exploration of deep Learning techniques has received relatively less scrutiny on recommender systems comparing to other research domains.
- Previous works only apply deep learning techniques to model auxiliary information.
- Textual descriptions of items
- Acoustic feature of musics
- Visual content of images
- Few works focus on modeling user-item interaction using deep learning techniques.
Motivation (2/2)
- Matrix factorization models user-item interaction via a simple inner product.
- However, inner product may not be sufficient for modeling user-item interaction.
- Inner product is no more than a linear function.
- The importance of every latent dimension is the same.
- Neural network has been proven to be capable of approximating any continuous function.
- Try to learn a user-item interaction function via deep neural network on implicit feedback.
sim(2, 3) > sim(1, 2) > sim(1, 3)
sim(4, 1) > sim(4, 3) > sim(4, 2)
Methodology (1/13)
-
Neural Collaborative Filtering (NCF) Framework
- Generalized Matrix Factorization (GMF)
- Linearity
- Multi-Layer Perceptron (MLP)
- Non-linearity
- Generalized Matrix Factorization (GMF)
Methodology (2/13)
- General Framework
- Input layer
- One-hot encoded vector
- Embedding layer
- Fully-connected
- Neural collaborative filtering layer
- The dimension of the last hidden layer X (termed as predictive factors) determines the model’s capability.
- Input layer
Methodology (3/13)
- General Framework
-
Output layer
- Pointwise Loss (this work)
- Pairwise Loss
- User-item interaction function
-
Output layer
Methodology (4/13)
- General Framework
- Objective function (Pointwise)
- Previous work: Regression with weighted squared loss
- Assumption
- Ratings are generated from Gaussian distribution.
- It may not tally well with implicit feedback.
- Assumption
- Previous work: Regression with weighted squared loss
- Objective function (Pointwise)
Methodology (5/13)
-
General Framework
-
Objective function (Pointwise)
-
Binary cross-entropy loss (Log loss)
- Address recommendation with implicit feedback as a binary classification problem.
-
Binary cross-entropy loss (Log loss)
-
Objective function (Pointwise)
Methodology (6/13)
- Generalized Matrix Factorization (GMF)
- MF can be interpreted as a special case of our NCF framework.
- Setting
- Activation function: identity function (i.e. )
- Weight vector: uniform vector of 1 (i.e. <1,1,...,1>)
x=f(x)
x=f(x)
Methodology (7/13)
-
Generalized Matrix Factorization (GMF)
-
A generalized and extended MF
-
Activation function
- Select sigmoid function to enforce non-linearity.
-
Weight vector
- Learning from data with log loss.
- Allow varying importance of latent dimensions.
-
Activation function
-
A generalized and extended MF
Methodology (8/13)
- Multi-Layer Perceptron (MLP)
- Simple concatenation of user embedding and item embedding does not account for any user-item interaction.
- Adding hidden layers may be a better choice.
Methodology (9/13)
- Multi-Layer Perceptron (MLP)
-
Activation function
-
ReLU
- Non-saturated
-
Sparse activation
- Well suited for sparse data and prevent overfitting.
-
ReLU
- Network structure
-
Tower structure
- Halve the layer size for each successive higher layer.
- More abstractive features of data can be learned by using a small number of hidden units for higher layers.
-
Tower structure
-
Activation function
Methodology (10/13)
- Fusion of GMF and MLP
- Goal
- Design a structure that allows GMF and MLP to mutually reinforce each other.
- Staightforward solution
- Let GMF and MLP to share the same embedding layer.
- Cons
- They must use the same size of embeddings.
- Cons
- Let GMF and MLP to share the same embedding layer.
- Goal
Methodology (11/13)
- Fusion of GMF and MLP
- Proposed solution
-
Neurual Matrix Factorization (NeuMF)
- Let GMF and MLP to learn separate embeddings and combine them by concatenating their last hidden layer.
- Use back-propagation to learn model parameters.
-
Neurual Matrix Factorization (NeuMF)
- Proposed solution
Methodology (12/13)
- Fusion of GMF and MLP
-
Pre-training
- Train GMF and MLP with random initializations until convergence.
- Use their model parameters as the initialization for the corresponding parts of NeuMF’s parameters.
- Trade-off between the two pre-trained models.
-
Pre-training
Methodology (13/13)
- Fusion of GMF and MLP
- Learning rate adaption
- Pre-training
- Adaptive Moment Estimation (Adam)
- Adapt the learning rate by performing smaller updates for frequent and larger updates for infrequent parameters.
- Adaptive Moment Estimation (Adam)
- Training
- Vanilla SGD
- Pre-training
- Learning rate adaption
Experiments (1/7)
- Datasets
- Binarize the rating data to implicit data.
- Remove users with #(feedbacks) < 20.
- Evaluation
- Leave-one-out evaluation
- Randomly sample unobserved 100 items for each user and rank them with the test item.
Experiments (2/7)
- Evaluation metrics
- HR @ 10
- NDCG @ 10
- Competitors
- ItemPop
- ItemKNN
- Bayesian Personalized Ranking (BPR)
- element-wise Alternating Least Squares (eALS)
- Optimize weighted squared loss.
- Treat all unobserved feedback as negative feedback and non-uniformly give weights by item's popularity.
Experiments (3/7)
- Implementation
- Github By Keras
- Parameter Settings
- Negative sampling: 4 negative instances per positive instance.
- Embedding size: 16
- Batch size: {128, 256, 512, 1024}
- Learning rate: {0.0001, 0.0005, 0.001, 0.005}
- Predictive factors: {8, 16, 32, 64}
- #(hidden layers): 3
- Trade-off parameter for pre-training: 0.5
Experiments (4/7)
- NCF outperforms the state-of-the-art implicit collaborative filtering methods.
- For BPR and eALS, the number of predictive factors is equal to the number of latent factors.
- GMF outperforms BPR.
- It shows the effectiveness of the classification aware log loss for the recommendation task.
Experiments (5/7)
- Pre-training is important for NCF.
Experiments (6/7)
- Log loss with negative sampling does work for recommendation tasks.
- The sampling method for pointwise objective may be more flexible than pairwise objective.
- Pointwise: arbitrary number per positive instance
- Pairwise: only one per positive instance
- Setting the sampling ratio too aggressively may adversely hurt the performance.
- The sampling method for pointwise objective may be more flexible than pairwise objective.
Experiments (7/7)
- Deep learning is helpful for recommendation tasks.
- Stacking more layers are beneficial to performance.
But the authors only show results of #(hidden layers) <= 4...
- Transforming latent factors with hidden layers are essential.
- Simple concatenation (MLP-0 in the table) yields poor results.
Conclusion and Future Works
-
This work opens up a new avenue of research possibilities for recommendation based on deep learning.
- A general framework NCF and three instantiations are proposed to solve the key collaborative filtering task.
- GMF, MLP and NeuMF
- A general framework NCF and three instantiations are proposed to solve the key collaborative filtering task.
- Future work
- Model auxiliary information
- Efficient online recommendation
[WWW][2017][Neural Collaborative Filtering]
By dreamrecord
[WWW][2017][Neural Collaborative Filtering]
- 208