Neural Collaborative Filtering

Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, Tat-Seng Chua
National University of Singapore

WWW'17

Motivation (1/2)

The exploration of deep Learning techniques has received relatively less scrutiny on recommender systems comparing to other research domains.
Previous works only apply deep learning techniques to model auxiliary information.
- Textual descriptions of items
- Acoustic feature of musics
- Visual content of images
Few works focus on modeling user-item interaction using deep learning techniques.

Motivation (2/2)

Matrix factorization models user-item interaction via a simple inner product.
However, inner product may not be sufficient for modeling user-item interaction.
- Inner product is no more than a linear function.
- The importance of every latent dimension is the same.
- Neural network has been proven to be capable of approximating any continuous function.
Try to learn a user-item interaction function via deep neural network on implicit feedback.

sim(2, 3) > sim(1, 2) > sim(1, 3)

sim(4, 1) > sim(4, 3) > sim(4, 2)

Methodology (1/13)

Neural Collaborative Filtering (NCF) Framework
- Generalized Matrix Factorization (GMF)
  - Linearity
- Multi-Layer Perceptron (MLP)
  - Non-linearity

Methodology (2/13)

General Framework
- Input layer
  - One-hot encoded vector
- Embedding layer
  - Fully-connected
- Neural collaborative filtering layer
  - The dimension of the last hidden layer X (termed as predictive factors) determines the model’s capability.

Methodology (3/13)

General Framework
- Output layer
  - Pointwise Loss (this work)
  - Pairwise Loss
- User-item interaction function

Methodology (4/13)

General Framework
- Objective function (Pointwise)
  - Previous work: Regression with weighted squared loss
    - Assumption
      - Ratings are generated from Gaussian distribution.
    - It may not tally well with implicit feedback.

Methodology (5/13)

General Framework
- Objective function (Pointwise)
  - Binary cross-entropy loss (Log loss)
    - Address recommendation with implicit feedback as a binary classification problem.

Methodology (6/13)

Generalized Matrix Factorization (GMF)
- MF can be interpreted as a special case of our NCF framework.
- Setting
  - Activation function: identity function (i.e. )
  - Weight vector: uniform vector of 1 (i.e. <1,1,...,1>)

x=f(x)

x=f(x)

Methodology (7/13)

Generalized Matrix Factorization (GMF)
- A generalized and extended MF
  - Activation function
    - Select sigmoid function to enforce non-linearity.
  - Weight vector
    - Learning from data with log loss.
    - Allow varying importance of latent dimensions.

Methodology (8/13)

Multi-Layer Perceptron (MLP)
- Simple concatenation of user embedding and item embedding does not account for any user-item interaction.
- Adding hidden layers may be a better choice.

Methodology (9/13)

Multi-Layer Perceptron (MLP)
- Activation function
  - ReLU
    - Non-saturated
    - Sparse activation
      - Well suited for sparse data and prevent overfitting.
- Network structure
  - Tower structure
    - Halve the layer size for each successive higher layer.
    - More abstractive features of data can be learned by using a small number of hidden units for higher layers.

Methodology (10/13)

Fusion of GMF and MLP
- Goal
  - Design a structure that allows GMF and MLP to mutually reinforce each other.
- Staightforward solution
  - Let GMF and MLP to share the same embedding layer.
    - Cons
      - They must use the same size of embeddings.

Methodology (11/13)

Fusion of GMF and MLP
- Proposed solution
  - Neurual Matrix Factorization (NeuMF)
    - Let GMF and MLP to learn separate embeddings and combine them by concatenating their last hidden layer.
    - Use back-propagation to learn model parameters.

Methodology (12/13)

Fusion of GMF and MLP
- Pre-training
  - Train GMF and MLP with random initializations until convergence.
  - Use their model parameters as the initialization for the corresponding parts of NeuMF’s parameters.
  - Trade-off between the two pre-trained models.

Methodology (13/13)

Fusion of GMF and MLP
- Learning rate adaption
  - Pre-training
    - Adaptive Moment Estimation (Adam)
      - Adapt the learning rate by performing smaller updates for frequent and larger updates for infrequent parameters.
  - Training
    - Vanilla SGD

Experiments (1/7)

Datasets
- Binarize the rating data to implicit data.
- Remove users with #(feedbacks) < 20.
Evaluation
- Leave-one-out evaluation
- Randomly sample unobserved 100 items for each user and rank them with the test item.

Experiments (2/7)

Evaluation metrics
- HR @ 10
- NDCG @ 10
Competitors
- ItemPop
- ItemKNN
- Bayesian Personalized Ranking (BPR)
- element-wise Alternating Least Squares (eALS)
  - Optimize weighted squared loss.
  - Treat all unobserved feedback as negative feedback and non-uniformly give weights by item's popularity.

Experiments (3/7)

Implementation
- Github By Keras
Parameter Settings
- Negative sampling: 4 negative instances per positive instance.
- Embedding size: 16
- Batch size: {128, 256, 512, 1024}
- Learning rate: {0.0001, 0.0005, 0.001, 0.005}
- Predictive factors: {8, 16, 32, 64}
- #(hidden layers): 3
- Trade-off parameter for pre-training: 0.5

Experiments (4/7)

NCF outperforms the state-of-the-art implicit collaborative filtering methods.
- For BPR and eALS, the number of predictive factors is equal to the number of latent factors.
- GMF outperforms BPR.
  - It shows the effectiveness of the classification aware log loss for the recommendation task.

Experiments (5/7)

Pre-training is important for NCF.

Experiments (6/7)

Log loss with negative sampling does work for recommendation tasks.
- The sampling method for pointwise objective may be more flexible than pairwise objective.
  - Pointwise: arbitrary number per positive instance
  - Pairwise: only one per positive instance
- Setting the sampling ratio too aggressively may adversely hurt the performance.

Experiments (7/7)

Deep learning is helpful for recommendation tasks.
- Stacking more layers are beneficial to performance.
- ~~But the authors only show results of #(hidden layers) <= 4...~~
Transforming latent factors with hidden layers are essential.
- Simple concatenation (MLP-0 in the table) yields poor results.

Conclusion and Future Works

This work opens up a new avenue of research possibilities for recommendation based on deep learning.
- A general framework NCF and three instantiations are proposed to solve the key collaborative filtering task.
  - GMF, MLP and NeuMF
Future work
- Model auxiliary information
- Efficient online recommendation

Neural Collaborative Filtering

WWW'17

Motivation (1/2)

Motivation (2/2)

Methodology (1/13)

Methodology (2/13)

Methodology (3/13)

Methodology (4/13)

Methodology (5/13)

Methodology (6/13)

Methodology (7/13)

Methodology (8/13)

Methodology (9/13)

Methodology (10/13)

Methodology (11/13)

Methodology (12/13)

Methodology (13/13)

Experiments (1/7)

Experiments (2/7)

Experiments (3/7)

Experiments (4/7)

Experiments (5/7)

Experiments (6/7)

Experiments (7/7)

Conclusion and Future Works

[WWW][2017][Neural Collaborative Filtering]

[WWW][2017][Neural Collaborative Filtering]

dreamrecord

Neural Collaborative Filtering

WWW'17

Motivation (1/2)

Motivation (2/2)

Methodology (1/13)

Methodology (2/13)

Methodology (3/13)

Methodology (4/13)

Methodology (5/13)

Methodology (6/13)

Methodology (7/13)

Methodology (8/13)

Methodology (9/13)

Methodology (10/13)

Methodology (11/13)

Methodology (12/13)

Methodology (13/13)

Experiments (1/7)

Experiments (2/7)

Experiments (3/7)

Experiments (4/7)

Experiments (5/7)

Experiments (6/7)

Experiments (7/7)

Conclusion and Future Works

[WWW][2017][Neural Collaborative Filtering]

More from dreamrecord