Neural Collaborative Filtering
Neural Collaborative Filtering
-
所解决的问题
-
问题建模
-
目标
-
相关研究综述
-
两种目标函数的形式
-
矩阵分解技术
-
NEURAL-COLLABORATIVE FILTERING
-
-
- Expanded Matrix Factorization(GMF)
-
-
Pre training
我只是归纳整理了一下对论文阅读的感受,在未经许可的情况下使用原文或翻译都显得不太妥当。不过为了版权问题不得不选择以个人可见形式发布文章,并尽量避免侵权风险则选择将文章设为仅限个人观看。
Addressed Problem
This research endeavor specifically targets the aforementioned challenges by establishing a novel neural network-based methodology for collaborative filtering. The study emphasizes the importance of implicit feedback, highlighting its indirect representation of user preferences through diverse behaviors such as video consumption, product acquisition, and item interaction. By analyzing these patterns, the approach aims to provide a deeper understanding of user preferences in an increasingly data-driven environment.
- direct feedback (例如:ratings and reviews)
- indirect feedback indirectly reflects users' preferences through behaviors such as watching videos, purchasing products, and clicking items.
implicit feedback is inherently trackable without human intervention and therefore significantly simplifies the collection process for content providers, making it particularly advantageous.
Problem Formulation
M表示用户的数量,
N代表物品的数量,
其中符号 Y \in \mathbb{R}^{M*N} 表示用户的与物品之间的互动矩阵。

A value of 1 for y_{ui} represents an interaction between user u and item i . However, it does not imply that user u actually likes item i . Likewise, a value of 0 does not necessarily indicate that user u dislikes item i .
Notice: Although observed entries at least reflect the interests of users in items, the **unobserved entries may merely represent missing data, with a natural lack of negative feedback.
The recommendation problem with implicit feedback is modeled as a mathematical task involving estimating the scores of unobserved entries in Y , which employs ranking techniques to order the items.
Goal
Predict \hat{y}_{ui} by employing the function f with parameters u, i, and \Theta. The predicted interaction score between user u and item i is represented by \hat{y}_{ui}. The model's parameters are denoted by \Theta. The function f acts as a mapping from the model parameters to the predicted scores, which we refer to as an interaction function. Furthermore, this model is based on a neural network architecture.
Related Work
Two types of objective function
point loss is a natural extension of extensive research on explicit feedback. The methods underlying pointwise learning typically adopt a regression framework by minimizing the squared difference between estimated rating \hat{y}_{ui} and actual rating y_{ui}. The formula for pointwise squared loss is defined as L_{sqr}= \sum\limits_{(u,i) \in (\mathcal{Y}\cup\mathcal{Y}^-)} w_{ui}( \hat{y}_{ui}- y_{ui})^2, where \mathcal{Y} represents the set of observed interactions within the interaction set \mathcal{Y}, while \mathcal{Y}^- encompasses all or a subset of unobserved interactions. The weighting parameter w_{ui} assigns varying significance to each training instance (u,i) based on specific criteria.
pairwise loss: The central concept behind this approach is that observed entries should be ranked above unobserved entries. Instead of minimizing the direct difference between \hat{y}_{ui} and y_{ui}, pairwise learning aims to separate these two sets of entries by maximizing their margin, specifically between \hat{y}_{ui} and \hat{y}_{vi}.
Proposed Loss
In the following, we introduce a probabilistic model for learning the pointwise NCF with a particular emphasis on leveraging the binary nature of implicit data.
(Aim to maximize the log-likelihood function)

Matrix Factorization
Each user and item are associated with a real-valued vector of latent features through MF.
For each user u, we define p_u, which represents the latent vector assigned to them.
Similarly, for each item i, we denote its corresponding latent vector as q_i.
Let K denote the dimension of the latent space.
The predicted rating \hat{y}_{ui} is computed as:
\hat{y}_{ui} = f(u,i|p_u,q_i) = p_u^T q_i = \sum_{k=1}^{K} p_{uk} q_{ik}
Drawback :
MF can be deemed as a linear model of latent factors.

we employ the Jaccard coefficient as a measure of similarity between two users that MF aims to reconstruct.

我们首先关注图1a中的前三个行(用户)。显然可以实现s23值为0.66大于s12的0.5进而大于s13的0.4。这些点在潜在空间中的几何关系可以通过图1b展示。接着考虑一个新的用户u4其输入如图1a中的虚线所示。我们发现s41值为0.6大于s43的0.4再大于s42的0.2这意味着u4最接近的是u₁随后是u₃最后是u₂。然而如果采用基于MF模型将p₄放置得尽可能靠近p₁(如图₁b所示用虚线标记)这会导致p₄离p₂比p₃更近从而造成较大的排名损失。
NEURAL COLLABORATIVE FILTERING

- Since this work focuses on the pure collaborative filtering setting, we use only the identity of a user and an item as the input feature , transforming it to a binarized sparse vector with one-hot encoding. Note that with such a generic feature representation for inputs, our method can be easily adjusted to address the cold-start problem by using content features to represent users and items.
- Above the input layer is the embedding layer; it is a fully connected layer that projects the sparse representation to a dense vector.
- The user embedding and item embedding are then fed into a multi-layer neural architecture
Generalized Matrix Factorization(GMF)

GMF(left hand side): same as mf(latent vector comes from FC embedding)
Pre-training
- The initialization procedure plays a crucial role in determining the convergence behavior and overall performance of deep learning models.
- We propose a novel method, referred to as NeuMF-proposed, which initializes the network using pretrained models from GMF and MLP.
- Initially, we trained both GMF and MLP networks using random initializations until achieving convergence. Subsequently, we employed their learned model parameters as the basis for initializing the corresponding parameter blocks in NeuMF's architecture.
