Neural Collaborative Filtering

Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, Tat-Seng Chua
National University of Singapore

WWW'17

Motivation (1/2)

  • The exploration of deep Learning techniques has received relatively less scrutiny on recommender systems comparing to other research domains.
  • Previous works only apply deep learning techniques to model auxiliary information.
    • Textual descriptions of items
    • Acoustic feature of musics
    • Visual content of images
  • Few works focus on modeling user-item interaction using deep learning techniques.

Motivation (2/2)

  • Matrix factorization models user-item interaction via a simple inner product.
  • However, inner product may not be sufficient for modeling user-item interaction.
    • Inner product is no more than a linear function.
    • The importance of every latent dimension is the same.
    • Neural network has been proven to be capable of approximating any continuous function.
  • Try to learn a user-item interaction function via deep neural network on implicit feedback.

sim(2, 3) > sim(1, 2) > sim(1, 3)

sim(4, 1) > sim(4, 3) > sim(4, 2)

Methodology (1/13)

  • Neural Collaborative Filtering (NCF) Framework
    • Generalized Matrix Factorization (GMF)
      • Linearity
    • Multi-Layer Perceptron (MLP)
      • Non-linearity

Methodology (2/13)

  • General Framework
    • Input layer
      • One-hot encoded vector
    • Embedding layer
      • Fully-connected
    • Neural collaborative filtering layer
      • The dimension of the last hidden layer X (termed as predictive factors) determines the model’s capability.

Methodology (3/13)

  • General Framework
    • Output layer
      • Pointwise Loss (this work)
      • Pairwise Loss
    • User-item interaction function                                                        

Methodology (4/13)

  • General Framework
    • Objective function (Pointwise)
      • Previous work: Regression with weighted squared loss
        • Assumption
          • Ratings are generated from Gaussian distribution.
        • It may not tally well with implicit feedback.

Methodology (5/13)

  • General Framework
    • Objective function (Pointwise)
      • Binary cross-entropy loss (Log loss)
        • Address recommendation with implicit feedback as a binary classification problem.

Methodology (6/13)

  • Generalized Matrix Factorization (GMF)
    • MF can be interpreted as a special case of our NCF framework.
    • Setting
      • Activation function: identity function (i.e.                    )
      • Weight vector: uniform vector of 1 (i.e. <1,1,...,1>)
x=f(x)
x=f(x)x=f(x)

Methodology (7/13)

  • Generalized Matrix Factorization (GMF)
    • A generalized and extended MF
      • Activation function
        • Select ​sigmoid function to enforce non-linearity.
      • ​Weight vector
        • ​Learning from data with log loss.
        • Allow varying importance of latent dimensions.

Methodology (8/13)

  • Multi-Layer Perceptron (MLP)
    • Simple concatenation of user embedding and item embedding does not account for any user-item interaction.
    • Adding hidden layers may be a better choice.

Methodology (9/13)

  • Multi-Layer Perceptron (MLP)
    • Activation function
      • ReLU
        • Non-saturated
        • Sparse activation
          • Well suited for sparse data and prevent overfitting.
    • Network structure
      • Tower structure
        • Halve the layer size for each successive higher layer.
        • More abstractive features of data can be learned by using a small number of hidden units for higher layers.

Methodology (10/13)

  • Fusion of GMF and MLP
    • Goal
      • Design a structure that allows GMF and MLP to mutually reinforce each other.
    • Staightforward solution
      • Let GMF and MLP to share the same embedding layer.
        • Cons
          • They must use the same size of embeddings.

Methodology (11/13)

  • Fusion of GMF and MLP
    • Proposed solution
      • Neurual Matrix Factorization (NeuMF)
        • Let GMF and MLP to learn separate embeddings and combine them by concatenating their last hidden layer.
        • Use back-propagation to learn model parameters.

Methodology (12/13)

  • Fusion of GMF and MLP
    • Pre-training
      • Train GMF and MLP with random initializations until convergence.
      • Use their model parameters as the initialization for the corresponding parts of NeuMF’s parameters.
      • Trade-off between the two pre-trained models.

Methodology (13/13)

  • Fusion of GMF and MLP
    • Learning rate adaption
      • Pre-training
        • Adaptive Moment Estimation (Adam)
          • Adapt the learning rate by performing smaller updates for frequent and larger updates for infrequent parameters.
      • Training
        • Vanilla SGD

Experiments (1/7)

  • Datasets



     
    • Binarize the rating data to implicit data.
    • Remove users with #(feedbacks) < 20.
  • Evaluation
    • Leave-one-out evaluation
    • Randomly sample unobserved 100 items for each user and rank them with the test item.

Experiments (2/7)

  • Evaluation metrics
    • HR @ 10
    • NDCG @ 10
  • Competitors
    • ItemPop
    • ItemKNN
    • Bayesian Personalized Ranking (BPR)
    • element-wise Alternating Least Squares (eALS)
      • Optimize weighted squared loss.
      • Treat all unobserved feedback as negative feedback and non-uniformly give weights by item's popularity.

Experiments (3/7)

  • Implementation
  • Parameter Settings
    • Negative sampling: 4 negative instances per positive instance.
    • Embedding size: 16
    • Batch size: {128, 256, 512, 1024}
    • Learning rate: {0.0001, 0.0005, 0.001, 0.005}
    • Predictive factors: {8, 16, 32, 64}
    • #(hidden layers): 3
    • Trade-off parameter for pre-training: 0.5

Experiments (4/7)

  • NCF outperforms the state-of-the-art implicit collaborative filtering methods.
    • For BPR and eALS, the number of predictive factors is equal to the number of latent factors.
    • GMF outperforms BPR.
      • It shows the effectiveness of the classification aware log loss for the recommendation task.

Experiments (5/7)

  • Pre-training is important for NCF.

Experiments (6/7)

  • Log loss with negative sampling does work for recommendation tasks.
    • The sampling method for pointwise objective may be more flexible than pairwise objective.
      • Pointwise: arbitrary number per positive instance
      • Pairwise: only one per positive instance
    • Setting the sampling ratio too aggressively may adversely hurt the performance.

Experiments (7/7)

  • Deep learning is helpful for recommendation tasks.
    • Stacking more layers are beneficial to performance.
    • But the authors only show results of #(hidden layers) <= 4...
  • Transforming latent factors with hidden layers are essential.
    • ​Simple concatenation (MLP-0 in the table) yields poor results.

Conclusion and Future Works

  • This work opens up a new avenue of research possibilities for recommendation based on deep learning.
    • A general framework NCF and three instantiations are proposed to solve the key collaborative filtering task.
      • GMF, MLP and NeuMF
  • Future work
    • Model auxiliary information
    • Efficient online recommendation

[WWW][2017][Neural Collaborative Filtering]

By dreamrecord

[WWW][2017][Neural Collaborative Filtering]

  • 208