Neural Collaborative Filtering

阅读量：

Neural Collaborative Filtering

所解决的问题
问题建模
目标
相关研究综述
两种目标函数的形式
矩阵分解技术
NEURAL-COLLABORATIVE FILTERING
- - Expanded Matrix Factorization(GMF)
Pre training

我只是归纳整理了一下对论文阅读的感受，在未经许可的情况下使用原文或翻译都显得不太妥当。不过为了版权问题不得不选择以个人可见形式发布文章，并尽量避免侵权风险则选择将文章设为仅限个人观看。

Addressed Problem

This research endeavor specifically targets the aforementioned challenges by establishing a novel neural network-based methodology for collaborative filtering. The study emphasizes the importance of implicit feedback, highlighting its indirect representation of user preferences through diverse behaviors such as video consumption, product acquisition, and item interaction. By analyzing these patterns, the approach aims to provide a deeper understanding of user preferences in an increasingly data-driven environment.

direct feedback (例如：ratings and reviews)
- indirect feedback indirectly reflects users' preferences through behaviors such as watching videos, purchasing products, and clicking items.

implicit feedback is inherently trackable without human intervention and therefore significantly simplifies the collection process for content providers, making it particularly advantageous.

Problem Formulation

$M$ 表示用户的数量，
$N$ 代表物品的数量，
其中符号 $Y \in \mathbb{R}^{M*N}$ 表示用户的与物品之间的互动矩阵。

A value of 1 for $y_{ui}$ represents an interaction between user $u$ and item $i$ . However, it does not imply that user $u$ actually likes item $i$ . Likewise, a value of 0 does not necessarily indicate that user $u$ dislikes item $i$ .

Notice: Although observed entries at least reflect the interests of users in items, the **unobserved entries may merely represent missing data, with a natural lack of negative feedback.

The recommendation problem with implicit feedback is modeled as a mathematical task involving estimating the scores of unobserved entries in Y , which employs ranking techniques to order the items.

Goal

Predict $\hat{y}_{ui}$ by employing the function $f$ with parameters $u$ , $i$ , and $\Theta$ . The predicted interaction score between user u and item i is represented by $\hat{y}_{ui}$ . The model's parameters are denoted by $\Theta$ . The function $f$ acts as a mapping from the model parameters to the predicted scores, which we refer to as an interaction function. Furthermore, this model is based on a neural network architecture.

Two types of objective function

point loss is a natural extension of extensive research on explicit feedback. The methods underlying pointwise learning typically adopt a regression framework by minimizing the squared difference between estimated rating $\hat{y}_{ui}$ and actual rating $y_{ui}$ . The formula for pointwise squared loss is defined as $L_{sqr}= \sum\limits_{(u,i) \in (\mathcal{Y}\cup\mathcal{Y}^-)} w_{ui}( \hat{y}_{ui}- y_{ui})^2$ , where $\mathcal{Y}$ represents the set of observed interactions within the interaction set $\mathcal{Y}$ , while $\mathcal{Y}^-$ encompasses all or a subset of unobserved interactions. The weighting parameter $w_{ui}$ assigns varying significance to each training instance $(u,i)$ based on specific criteria.

pairwise loss: The central concept behind this approach is that observed entries should be ranked above unobserved entries. Instead of minimizing the direct difference between $\hat{y}_{ui}$ and $y_{ui}$ , pairwise learning aims to separate these two sets of entries by maximizing their margin, specifically between $\hat{y}_{ui}$ and $\hat{y}_{vi}$ .

Proposed Loss
In the following, we introduce a probabilistic model for learning the pointwise NCF with a particular emphasis on leveraging the binary nature of implicit data.
(Aim to maximize the log-likelihood function)

Matrix Factorization

Each user and item are associated with a real-valued vector of latent features through MF.
For each user $u$ , we define $p_u$ , which represents the latent vector assigned to them.
Similarly, for each item $i$ , we denote its corresponding latent vector as $q_i$ .
Let $K$ denote the dimension of the latent space.
The predicted rating $\hat{y}_{ui}$ is computed as:

\hat{y}_{ui} = f(u,i|p_u,q_i) = p_u^T q_i = \sum_{k=1}^{K} p_{uk} q_{ik}

Drawback :
MF can be deemed as a linear model of latent factors.

we employ the Jaccard coefficient as a measure of similarity between two users that MF aims to reconstruct.

我们首先关注图1a中的前三个行（用户）。显然可以实现s23值为0.66大于s12的0.5进而大于s13的0.4。这些点在潜在空间中的几何关系可以通过图1b展示。接着考虑一个新的用户u4其输入如图1a中的虚线所示。我们发现s41值为0.6大于s43的0.4再大于s42的0.2这意味着u4最接近的是u₁随后是u₃最后是u₂。然而如果采用基于MF模型将p₄放置得尽可能靠近p₁（如图₁b所示用虚线标记）这会导致p₄离p₂比p₃更近从而造成较大的排名损失。