Advertisement

2020_TKDE_DiffNet++_A Neural Influence and Interest Diffusion Network for Social Recommendation

阅读量:

[论文阅读笔记]2020_TKDE_DiffNet++_A Neural Influence and Interest Diffusion Network for Social Recommendation

论文下载地址为:https://arxiv.org/abs/2002.00844v2;发表期刊为TKDE;发表时间为2020年;作者信息如下:

获取途径为:https://arxiv.org/abs/2002.00844v2;发表期刊为TKDE;发表时间为2020年;作者信息如下:

Le Wu (Senior Member), Junwei Li (Mr.), Peijie Sun (Ms.), Richang Hong (Mr.), Yong Ge (Ms.), Meng Wang (Dr.), all members of IEEE

数据集: 正文中的介绍

代码:

其他:

其他人写的文章

简要概括创新点: 作者们去年发了DiffNet(只考虑的Social Network),今年再发DiffNet++(再考虑上user-item interset Network).

  • (1)we propose DiffNet++, an improved algorithm of DiffNet that models the neural influence diffusion and interest diffusion in a unified framework. (我们提出了DiffNet++,这是一种改进的DiffNet算法,它在一个统一的框架中对神经影响扩散和兴趣扩散进行建模。)

  • (2)By reformulating the social recommendation as a heterogeneous graph with social network and interest network as input, DiffNet++ advances DiffNet by injecting both the higher-order user latent interest reflected in the user-item graph and higher-order user influence reflected in the user-user graph for user embedding learning. (DiffNet++通过将社会推荐转化为一个以社会网络和兴趣网络为输入的异构图,将反映在user-item图中的高阶用户潜在兴趣和反映在user-user图中的高阶用户影响注入DiffNet,以进行用户嵌入学习。)

  • (3)This is achieved by iteratively aggregating each user’s embedding from three aspects: (这是通过从三个方面迭代聚合每个用户的嵌入来实现的:)

    • the user’s previous embedding, (用户之前的嵌入,)
    • the influence aggregation of social neighbors from the social network, (社交网络中社交邻居的影响力聚集)
    • and the interest aggregation of item neighbors from the user-item interest network. (来自用户项目兴趣网络的项目邻居的兴趣聚合。)
  • (4)Furthermore, we design a multi-level attention network that learns how to attentively aggregate user embeddings from these three aspects. (此外,我们设计了一个多层次的注意网络,学习如何从这三个方面区分注意力地集中用户嵌入)

Abstract

  • (1) Social recommendation has emerged to leverage social connections among users for predicting users’ unknown preferences, which could alleviate the data sparsity issue in collaborative filtering based recommendation. (社交推荐利用用户之间的社交关系来预测用户的未知偏好,这可以缓解基于协同过滤的推荐中的数据稀疏问题。)
  • (2) Early approaches relied on utilizing each user’s first-order social neighbors’ interests for better user modeling, and failed to model the social influence diffusion process from the global social network structure. (早期的方法依赖于利用每个用户的一阶社交邻居的兴趣进行更好的用户建模,而未能从全球社交网络结构中建模社会影响扩散过程。)
  • (3) Recently, we propose a preliminary work of a neural influence Diffusion Network (i.e., DiffNet) for social recommendation [43]. DiffNet models the recursive social diffusion process for each user, such that the influence diffusion hidden in the higher-order social network is captured in the user embedding process. Despite the superior performance of DiffNet, we argue that, as users play a central role in both user-user social network and user-item interest network, only modeling the influence diffusion process in the social network would neglect the latent collaborative interests of users hidden in the user-item interest network. (最近,我们提出了一个用于社会推荐的神经影响扩散网络(即DiffNet)的初步工作[43]。DiffNet为每个用户建模递归的社会扩散过程,从而在用户嵌入过程中捕获隐藏在高阶社会网络中的影响扩散。尽管DiffNet的性能优越,但我们认为,由于用户在用户-用户-社交网络和用户项目-兴趣网络中都扮演着核心角色,因此仅对社交网络中的影响扩散过程进行建模将忽略隐藏在用户项目-兴趣网络中的用户的潜在协作兴趣。)
  • (4) To this end, in this paper, we propose DiffNet++, an improved algorithm of DiffNet that models the neural influence diffusion and interest diffusion in a unified framework. (我们提出了DiffNet++,这是一种改进的DiffNet算法,它在一个统一的框架中对神经影响扩散和兴趣扩散进行建模。)
    • By reformulating the social recommendation as a heterogeneous graph with social network and interest network as input, DiffNet++ advances DiffNet by injecting both the higher-order user latent interest reflected in the user-item graph and higher-order user influence reflected in the user-user graph for user embedding learning. (DiffNet++通过将社会推荐转化为一个以社会网络和兴趣网络为输入的异构图,将反映在user-item图中的高阶用户潜在兴趣和反映在user-user图中的高阶用户影响注入DiffNet,以进行用户嵌入学习。)
    • This is achieved by iteratively aggregating each user’s embedding from three aspects: (这是通过从三个方面迭代聚合每个用户的嵌入来实现的:)
      • the user’s previous embedding, (用户之前的嵌入,)
      • the influence aggregation of social neighbors from the social network, (社交网络中社交邻居的影响力聚集)
      • and the interest aggregation of item neighbors from the user-item interest network. (来自用户项目兴趣网络的项目邻居的兴趣聚合。)

Additionally, we construct a multi-level attention network that is capable of selectively integrating user embeddings across these three dimensions.

After conducting copious experimental results on four real-world datasets, our proposed approach demonstrates its efficacy. We provide the source code at https://github.com/PeiJieSun/diffnet.

Index Terms

推荐系统, 图神经网络, 社交推荐系统, 影响扩散模型, 兴趣扩散模型

1 INTRODUCTION

基于协同过滤(CF)的推荐系统通过提取用户与物品之间的兴趣行为数据来获取用户的嵌入表示,并已在学术界和工业界引起了广泛关注[37]、[32]。然而,在大多数情况下由于用户行为数据有限性问题导致协同过滤算法面临着数据稀疏性挑战[1]。随着社交媒体的发展用户们在这些平台上建立了社交关系并分享了他们的商品偏好这一现象也得到了社会影响理论的支持从而引发了相似偏好间的相互影响进而促进了社交推荐技术的发展其核心目标是利用用户的社交关系来缓解数据稀疏性问题并提升推荐效果[19]、[20]、[14]、[43]

(2) Actually, since users occupy a central position in social platforms characterized by both user-user social interactions and user-item interaction patterns, the foundation of social recommendation lies in learning user embeddings that capture these two types of interactions mentioned above.

In addition, by considering the user-item interest network as a user-item matrix over the years, CF-based models have relied on matrix factorization techniques to project both users and items into a low-dimensional latent space. Furthermore, most social-based recommender systems have advanced these CF models by leveraging the user-user matrix to enhance each user's embedding learning with their social neighbors' records. However, they also incorporate regularization mechanisms that utilize information from these interactions. Specifically, methods like SocialMF, SR, and TrustSVD have been developed within this framework. For instance, SocialMF integrates social regularization terms derived from users' interactions with their peers into an optimization framework. Similarly, SR employs such terms within its loss function. TrustSVD goes beyond mere regularization by incorporating not just direct interactions but also indirect influences stemming from others' preferences or decisions. In conclusion, while these approaches primarily focus on first-order interactions between users and their immediate connections through shared interests or influence from others' preferences or decisions.

在这里插入图片描述

(3) 尽管这些社会推荐模型的性能有所提升, 我们认为目前的社会推荐模型仍有待进一步完善.

In fact, as demonstrated in Figure 1, users are of paramount importance within two distinct behavioral networks: the user-user social network and the user-item interest network.

  • 首先,在社交网络中存在一种全局递归的社会扩散机制。一方面,在这种机制下用户会自然形成一个具有高阶结构的社会图谱。每个用户不仅受到直接的一阶社交邻居的影响(即直接的朋友或熟人),还会受到更高阶的以自我为中心的社交网络结构的影响(即间接的朋友)。例如,在社交推荐过程中虽然用户u1并未直接关注u5(即u1不遵循u5),但因为存在两条二阶关系路径(u1→u2→u5 和u1→u4→u5),所以用户u1仍然会被认为是受到u5较大的影响。简单地说如果将社交网络结构仅简化为一阶社交邻居则无法在推荐过程中充分捕捉到这些高阶的社会影响效应

With the aim of analyzing user behavior patterns, we model users' two distinct behavioral patterns as a composite network comprising two types of graphs—namely, a user-user social network and a user-item interaction pattern framework—and introduce approaches to analyze the structural characteristics of composite networks in the context of social recommendation.

Actually, Graph Convolutional Networks (GCNs) have achieved remarkable success in learning of graph structures, showcasing their theoretical elegance, practical flexibility, and high performance [5], [8], [22]. The mechanism behind GCNs involves executing node feature propagation within the graph. This process performs recursive propagation of node features via iterative convolutional aggregations from neighboring nodes. Through this approach, GCNs are able to capture up to the K-th order of graph structure through K iterations [47].

Considering user-item interactions as bilinear factor graphs and user-user social networks as structural networks, several studies have demonstrated that GCNs can be effectively utilized for each type individually, particularly in the contexts of recommendation systems and community detection, with notable contributions from sources such as [51],[41],[48],[43].

  • On one hand, within the context of a user-item interest graph, NGCF was introduced as a method aimed at directly encoding collaborative information among users through an exploration of higher-order connectivity patterns using embedding propagation techniques [41].
  • On the other hand, building upon existing studies, we developed a Diffusion neural Network (DiffNet) specifically designed to model and simulate recursive social diffusion processes within social networks. This innovative approach enables direct modeling of high-order social structures during recursive user embedding computation [43].
  • These graph-based approaches have demonstrated superior performance relative to conventional non-graph-based recommendation systems by effectively capturing and modeling graph structures. However, addressing the challenge of developing a unified framework for better user modeling across these diverse graphs remains an underexplored area.

Within this study, we introduce an enhanced version of our initial DiffNet architecture. We simultaneously model both types of graph structures—user-item graphs and user-user graphs—for application in social recommendation.

  • Although the idea appears straightforward when considering both a user's social network and interest network, its practical implementation falls short due to the distinct roles these networks play in revealing a user's underlying preferences.
  • Moreover, users exhibit varying tendencies when balancing these two networks; some are more inclined to be influenced by their social connections, whereas others prefer to adhere to their individual preferences.

With the aim of achieving enhanced performance, we introduce an enhanced version of the original DiffNet algorithm. This improved method integrates both neural influence diffusion and interest diffusion processes within a single unified framework.

Additionally, we construct a multi-level attention network architecture that enables us to pay close attention to user embeddings derived from various nodes within the same graph and across different graphs.

(6) In summary, our main contributions are listed as follows:

相较于我们先前的DiffNet工作[43],本研究将社交推荐问题重新定义为基于用户-项目兴趣图与用户-用户社交图共同输入预测缺失边的问题。
我们提出DiffNet++模型,在统一框架中整合了社会网络中的高阶社会影响扩散与兴趣网络中的兴趣扩散。
此外,我们精心设计了一个多级注意力网络以仔细学习用户偏好不同图源的趋势。

Our proposed DiffNet++ model demonstrates significantly its efficacy on two real-world datasets, which are shown to be superior compared to the baseline with the best performance.

2.1 Problem Definition

He proposed that the social recommendation system comprises two fundamental collections of entities: a user collection U, whose size is M, and an item collection V, whose size is N. We can observe that users exhibit two distinct behavioral patterns on the social platform: establishing social connections with other users as well as expressing their preferences for items.

复制代码
* These two kinds of behaviors could be defined as two matrices: a user-user social connection matrix $S \in R^{M\times M}$, and a user-item interaction matrix $R \in R^{M\times N}$.
* In the social matrix $S$, if user $a$ trusts or follows user $b$, $s_{ba} = 1$, otherwise it equals 0.
* We use $S_a$ to represent the user set that user $a$ follows, i.e., $S_a= [b|s_{ba} = 1].$
* The user-item matrix $R$ shows users’ rating preferences and interests to items. (用户项目矩阵R显示用户对项目的评分偏好和兴趣。)
* As some implicit feedbacks (e.g., watching movies, purchasing items, listening to songs ) are more common in real-world applications, we also consider the recommendation scenario with implicit feedback [37]. (由于一些隐含的反馈(例如,观看电影,购买物品,听歌曲)在现实世界中的应用更为普遍,我们也考虑隐含反馈的推荐场景[37 ]。)
* Let $R$ denote users’ implicit feedback based rating matrix, with $r_{ai} = 1$ if user $a$ is interested in item $i$, otherwise it equals 0.
* We use $R_a$ represents the item set that user $a$ has consumed, i.e., $R_a= [i | r_{ai} = 1]$,
* and $R_i$ denotes the user set which consumed the item $i$, i.e., $R_i = [a|r_{ia} = 1]$.

Considering the two categories of users' activities, the user-user social network is represented as a user-user directed graph: GS=< U,S >.

其中U代表社交网络中所有用户的节点;若该社交网络是无向的,则用户a与用户b之间的连接表示a关注b且反之亦然;即s_ab=1且s_ba=1.

The user interest network is represented as a bipartite graph based on the user-item rating matrix R, denoted by G_I = < U \cup V, R >.

Additionally, each user a has real-value attributes (e.g., user profile), which are represented by x_a in the user attribute matrix X ∈ R^{d1×M}.

复制代码
* Also, each item $i$ has an attribute vector $y_i$(e.g., item text representation, item visual representation) in item attribute matrix $Y \in R^{d2\times N}$. We formulate the graph based social recommendation problem as:

Definition 1 (Graph Based Social Recommendation).

Given the user social network G_S and user interest network G_I, these two networks can be modeled as a heterogeneous graph that integrates G_S and G_I as follows: G = G_S \cup G_I = \langle U \cup V,\ X,\ Y,\ R,\ S \rangle. Then, the goal of graph-based social recommendation is to predict users' unknown preferences for items within this framework. Specifically, it involves forecasting unobserved connections in the social network graph such that \hat{R} = f(G) = f(U \cup V,\ X,\ Y,\ R,\ S) holds true. Herein, \hat{R} \in \mathbb{R}^{M\times N} represents the predicted preference matrix for users with respect to items.

Within this subsection, we classify the related works for social recommendations into three main types: traditional social recommendation approaches, more recent graph-based recommendation methods, and attention mechanisms in the field of recommendations.

2.2.1 Classical Social Recommendation Models.

(1) By formulating users’ historical behavior as a user-item interaction matrix R, most classical CF models embed both users and items in a low dimension latent space, such that each user’s predicted preference to an unknown item turns to the inner product between the corresponding user and item embeddings as [37], [32], [36]: (通过将用户的历史行为表述为用户-项目交互矩阵R,大多数经典CF模型将用户和项目嵌入到低维潜在空间中,使得每个用户对未知项目的预测偏好转向相应用户和项目嵌入之间的内积[37]、[32]、[36]:)

在这里插入图片描述

where $u_a$ represents the user's embedding, denoting that it occupies position a in the user embedding matrix. Similarly, $v_i`` represents item i)'s embedding, located at position i in its respective item embedding matrix.

(2) 实际上,在针对不同任务提出了多种专门化的矩阵分解模型之后(...),因子机则被提议作为一种通用方法,在简单的特征工程中模仿大多数因子分解模型[36]。最近,在解决协同过滤问题方面提出了基于深度学习的一些新方法[17][28]。这些方法通过建模用户间的非线性复杂关系以及稀疏特征输入之间的复杂关系而推进了先前的研究工作

(3) 用户兴趣之间的社会影响和社会关联构成了构建社会推荐系统的基础研究方向之一[29][38][25][24]。基于此,在协同过滤算法中引入社交网络的相关特征有助于缓解数据稀疏性问题并提升算法性能[30][38][15]。值得注意的是,在实际应用中由于嵌入式模型在表示学习方面具有显著优势因此大部分基于社交网络的推荐方法都采用了这种架构设计

These social embedding models could be summarized into the following two categories:

  • 基于社会规范化的做法 [19], [30], [21], [26], [44](共五种方法)
  • 以及基于用户行为增强的方法 [14], [15]。(共两种方法)
  • 具体而言,** specifically**, 基于社会规范化的方法假设在社交影响扩散过程中相连的用户会表现出相似的嵌入。
  • 基于此,在BPR中的经典协同过滤对齐损失函数 **[37]**的基础上,在整体优化函数中加入了额外的社会规范项。
在这里插入图片描述
复制代码
  * where $D$ is a diagonal matrix with $d_{aa} = \sum^{M}_{b=1}s_{ab}$. (D是对角矩阵)

(Rather than employing a social regularizer, certain scholars have posited that incorporating social networks can offer significant value in enhancing individual users' behavioral tendencies [50], [15].) TrustSVD stands out as a leading model in this domain, showcasing state-of-the-art performance [14], [15]. By assuming that the implicit feedbacks exerted by a user's social network neighbors on specific items can be treated as auxiliary feedback for that particular user, TrustSVD constructs an equation for predicting preferences:

在这里插入图片描述

其中 R_a= [i | r_{ai} = 1] 表示用户 a 显示出的隐式反馈项集,并且每个 y_i 都代表一个隐式因素向量。由此可见,在用户的嵌入学习过程中显式地构建了 SVD++ 模型的第一部分内容[23](因此包含了每个用户的喜欢项目)。接着,在第三部分中提到:假设某个用户的潜在嵌入为 u_b ,而这个用户 b 被 a 所信任;在这种情况下,则认为 a 的潜在嵌入能够通过考虑其被信任者的潜在影响来得到提升(即增强了)。

(5) As items are linked to attribute information such as product descriptions and visual attributes, ContextMF has been introduced as a method to integrate both societal influences and social networks under a collective matrix factorization framework by incorporating meticulously crafted regularization mechanisms [21]. (由于项目的属性信息包括产品描述等特征,ContextMF作为一种方法,在集体矩阵分解框架下整合了社会背景和社会网络,并通过巧妙设计的正则化机制进行了集成[21].)

Additionally, social recommendation has been advanced by sociometric circles [34], temporal dynamics [38], multifaceted contextual data [42], user roles within social networks [39], and training frameworks that avoid negative sampling [6]. While prior studies primarily concentrated on identifying and leveraging social connections, i.e., observed associations within social networks.

Recent introductions of CNSR focus on exploiting the global social network within the recommendation procedure [44]. In CNSR, each user's latent representation is constructed from two distinct components:

复制代码
* **a free latent embedding (classical CF models),** (自由潜在嵌入(经典CF模型),)
* and **a social network embedding** that captures the global social network structure. (以及一个社交网络嵌入,捕捉全球社交网络结构。)
* Despite the relative improvement of CNSR, we argue that CNSR is still suboptimal as the global social network embedding process is modeled for the network based optimization tasks instead of user preference learning. (尽管CNSR相对有所改进,但我们认为,CNSR仍然是次优的,因为全局社会网络嵌入过程是为基于网络的优化任务建模的,而不是用户偏好学习。)
* In contrast to CNSR, our work explicitly models the recursive social diffusion process in the global social network for optimizing the recommendation task. (与CNSR不同,我们的工作明确地模拟了全球社会网络中的递归社会扩散过程,以优化推荐任务。)
* Researchers proposed to generate social sequences based on **random walks** on user-user and user-item graph, and further leveraged the sequence embedding techniques for social recommendation [11]. This model could better capture the higher-order social network structure. However, the performance heavily relies on the choice of random walk strategy, including switching between user-item graph and user-user graph, and the length of random walk, which is both time-consuming and labor-consuming. (研究人员提出基于用户和用户项图上的随机游动生成社交序列,并进一步利用序列嵌入技术进行社交推荐[11]。该模型能够更好地捕捉高阶社会网络结构。然而,性能在很大程度上取决于随机游走策略的选择,包括在用户项目图和用户用户图之间切换,以及随机游走的长度,这既耗时又费工。)

2.2.2 Graph Convolutional Networks and Applications in Recommendation.

GCNs extend the convolution kernels from the regular Euclidean spaces to non-Euclidean graphs and have achieved remarkable empirical success in graph representation learning [5], [8], [22].

Specifically, GCNs通过多次迭代卷积操作来聚合邻域信息执行消息传递,在经过K次迭代后捕获了K阶图结构[22]。
通过将用户-物品交互视为一个图形结构,GCNs已被用于推荐[48]、[51]。

Early research primarily relied upon spectral GCNs, encountering significant computational challenges as evidenced by references such as [33] and [51]. Consequently, contemporary studies have increasingly concentrated their efforts on spatial-based GCNs for recommendation systems, as seen in works like [48], [4], and others cited in the literature.

  • PinSage 是一种基于GCN的内容推荐模型,在关联图中通过传播项目特征来构建其推荐机制[48]。
  • GCMC 采用图神经网络进行Collaborative Filtering(CF),在这一过程中直接建模了一阶邻居关系[4]。
  • NGCF 在GC-MC的基础上进行了多层扩展,在用户的嵌入学习过程中能有效捕捉用户与项目之间的高阶协作信号[41]。

(2) The social structure among users can naturally be represented as a user-user graph. Recently, we have proposed a preliminary model named DiffNet, which is designed to address the challenge of modeling the recursive nature of social influence within recommendations [43]. By incorporating meticulously designed influence diffusion layers, our approach has improved upon traditional embedding-based models. Users can effectively understand how they are impacted by this recursive process within their own communities.

  • For each user a , his updated embedding h^{k+1}_a is achieved via social diffusion of embeddings at their respective k -th layer in two steps: first, aggregating from their social neighbors' embeddings (Eq.(4)); secondly, combining their own latent embedding h^k_a with those neighbors.
  • Particularly, let K represent both the depth of influence diffusion layers and h^k_a representing users’ representations at their respective k -th layers within this context.
  • For each user a , his updated embedding h^{k+1}_a is achieved via social diffusion of embeddings at their respective k -th layer in two steps: first, aggregating from their social neighbors' embeddings (Eq.(4)); secondly, combining his own latent embedding h^k_a with those neighbors.
在这里插入图片描述
  • 在第一个方程中描述了一个池化操作(pooling operation),该操作将所有社交信任用户的影响力转化为固定长度向量h^{k+1}_{S_a}
  • 具体而言,在社交网络中进行社会推荐时,基于扩散深度K的DiffNet能够自动建模用户如何受到其第K阶社会邻居的影响。特别地,在这种情况下(当K=0时),社会扩散层会消失并退化为经典的协同过滤模型(CF)。

These prior GCN-based models either focused on high-order social networks or high-order user interest networks to improve recommendation performance.

There are some recent advancements in the field of graph neural networks that have been successfully applied to social recommendation tasks [10], [45]. Moreover, (existing studies have demonstrated that graphical neural networks can be effectively employed in social recommendation systems as evidenced by references such as [10] and [45].)

  • GraphRec旨在通过整合一阶社交和一阶项目邻居来构建用户的表示模型,并采用非线性神经网络作为融合的基础[10]。
  • 研究人员还提出了基于深度学习的技术来模拟用户社会行为和项目偏好所反映的动态与静态模式的复杂交互[45]。
  • 尽管这些工作依赖于基于深度学习的模型,并考虑了用户的两种行为类型,但它们仅建模了社交图和兴趣图的一阶结构。我们不同于这些研究者之处在于同时融合了高阶的社会与兴趣网络结构以实现更有效的社会推荐。

2.2.3 Attention Models and Applications.

  • (1) As a powerful and common technique, attention mechanism is often adopted when multiple elements in a sequence or set would have an impact of the following output, such that attentive weights are learned with deep neural networks to distinguish important elements [18], [3], [46]. (作为一种强大而常见的技术,当一个序列或集合中的多个元素会对后续输出产生影响时,通常会采用注意机制,例如,使用深层神经网络学习注意权重,以区分重要元素[18],[3],[46]。)
    • Given a user’s rated item history , NAIS is proposed to learn the neural attentive weights for item similarity in item based collaborative filtering [16]. (考虑到用户的评分项目历史,建议NAIS学习基于项目的协同过滤中项目相似性的神经注意权重[16]。)
    • For graph structure data, researchers proposed graph attention networks to attentively learn weights of each neighbor node in the graph convolutional process [40]. (对于图结构数据,研究人员提出图注意网络,以便在图卷积过程中仔细学习每个相邻节点的权重[40]。)
    • In social recommendation, many attention models have been proposed to learn the social influence strength [35], [38], [13], [10]. E.g., (在社会推荐中,人们提出了许多注意模型来学习社会影响力强度[35]、[38]、[13]、[10]。例如。,)
    • with each user’s direct item neighbors and social neighbors, GraphRec leverages attention modeling to learn the attentive weights for each social neighbor and each rated item for user modeling [10]. (对于每个用户的直接项目邻居和社交邻居,GraphRec利用注意力建模来了解每个社交邻居和每个用户建模评分项目的注意力权重[10]。)
    • In social contextual recommender systems , users’ preferences are influenced by various social contextual aspect, and an attention network was proposed to learn the attention weight of each social contextual aspect in the user decision process. (在社会语境推荐系统中,用户的偏好受到各种社会语境因素的影响,提出了一个注意网络来学习用户决策过程中各个社会语境因素的注意权重。)
    • Our work is also inspired by the applications of attention modeling, and apply it to fuse the social network and interest network for social recommendation. (我们的工作也受到了注意力建模应用的启发,并将其应用于融合社交网络和兴趣网络进行社会推荐。)

3 THE PROPOSED MODEL

  • In this section, we first demonstrate the overall architecture of our proposed model DiffNet++, and then introduce each component in detail. Subsequently, we will delve into the learning process of DiffNet++. Finally, a comprehensive discussion of the proposed model will be provided.
在这里插入图片描述

3.1 Model Architecture

  • (1) 在相关工作部分中,我们的初步工作采用递归影响扩散过程进行迭代用户嵌入学习,从而将高达K阶的社交网络结构注入到社交推荐过程中[43]。
  • 在本节中,我们提出了增强型模型DiffNet++,它是基于原始DiffNet的一种改进模型。
  • 我们展示了DiffNet++的整体神经架构图2.该架构由四个主要组件构成:
    • 嵌入层:接收相关输入并输出用户的自由嵌入表示以及项目的自由嵌入表示。
    • 融合层:融合内容特征与自由嵌入表示。
    • 影响与兴趣扩散层:在此层中,我们精心设计了一种多级注意力机制,能够有效地扩散高阶社会网络与兴趣网络。
    • 输出层:经过扩散过程达到稳定后,输出层预测每个未观察到的用户-项目对的偏好评分。

3.1.1 Embedding Layer.

  • 它使用对应的自由向量表示对用户和物品进行了编码。设 P \in R^{M\times D} Q \in R^{N\times D} 分别表示具有 D 维自由潜在嵌入矩阵的用户和物品。
  • 对于用户 a 的一热表示,在嵌入层中进行索引选择并输出自由用户潜在嵌入 p_a (即从用户的自由嵌入矩阵 P 转置后的第 a 行)。
  • 同样地,在项目 i 的情况下,其嵌入 q_i是项目自由嵌入矩阵Q中的第 i行的转置。

3.1.2 Fusion Layer.

对于每个用户a而言,在线学习系统通过融合层接收其相关的特征向量x_a以及来自不同数据源的信息p_a, 并生成一个用户融合嵌入码\mathbf{u}_a^0, 以捕捉该用户自不同数据源出发的初始兴趣。我们将其建模为:

在这里插入图片描述
  • 此处定义的矩阵W_1是实现变换的核心工具,
  • 其中函数g(x)负责执行特定变换任务。
  • 为了避免混淆,在此省略偏置项的贡献。
  • 该融合机制提供了一种灵活的方式去支持多种融合操作,
  • 例如通过设置I(W_1=I,g(x)=x)即可实现标准拼接操作。

Similarly, for each item i, the fusion layer expresses the item embedding v^0\_i as a mathematical relationship between its freely varying latent vector q\_i and its feature vector y\_i.

在这里插入图片描述

3.1.3 Influence and Interest Diffusion Layers.

By incorporating each user a's fused embedding u^0_a and each item i's fused embedding v^0_i into the influence and interest diffusion modules, these modules iteratively capture the dynamics of this user's latent preference and the item's latent preference spread within graph G, employing layer-wise convolutions to process information progressively.

  • 在每一层k+1处,在用户a的嵌入表示u^k_a与物品i的嵌入表示v^k_i的基础上,在上一层k中获取这些信息作为输入后,在每一层递归计算出更新后的物品i和用户的a的嵌入表示v^{k+1}_iu^{k+1}_a
  • 该迭代步骤从k=0开始执行,并在递归过程达到预定义的最大深度K时终止。
  • 由于每个物品仅存在于用户-物品兴趣图中,在后续内容中我们将首先介绍如何更新物品嵌入表示的方法,并随后介绍具有影响力与兴趣扩散影响的用户嵌入表示方法。

This method determines the updated version of each item's embedding vector based on its k-th layer embedding vector, which is derived from G_I at the (k+1)-th layer.

在这里插入图片描述
复制代码
* where $R_i = [a | r_{ia} = 1]$ is the userset that rates item $i$. (对项目ii进行评分的用户集。)
* $u^k_a$ is the $k$-th layer embedding of user $a$. (用户$a$的第$k$层嵌入。)
* $\hat{v}^{k+1}_i$ is the item $i$’s aggregated embedding from its neighbor users in the user-item interest graph $G_I$, (是在用户项兴趣图$G_I$中,项$i$从其相邻用户的聚合嵌入​,)
* with $\eta^{k+1}_{ia}$ denotes the aggregation weight. (表示聚合权重。)
* After obtaining the aggregated embedding $\hat{v}^{k+1}_i$ from the $k$-th layer, each item’s updated embedding $v^{k+1}_i$ is a fusion of the aggregated neighbors’ embeddings and the item’s emebedding at previous layer $k$. (从第$k$层获得聚合嵌入$\hat{v}^{k+1}_i​$之后,每个项目的更新嵌入$v^{k+1}_i$是聚合的邻居的嵌入和前一层$k$中项目的嵌入的融合。)
* In fact, we try different kinds of fusion functions, including the **concatenation** and the **addition** , and find the addition always shows the best performance. Therefore, we use the addition as the fusion function in Eq.(9).  (事实上,我们尝试了不同类型的融合函数,包括级联和加法,发现加法总是表现出最好的性能。因此,我们使用加法作为等式(9)中的融合函数。)

(3) 在项目的邻居聚合函数中, 等式(8)表明了用户a对物品i的权重。一个直观的想法是利用均值池化操作聚合i邻居用户的嵌入表示, 即\hat{v}^{k+1}_i = \sum_{a\in R_i} \frac{1}{|R_i|} u^k_a 。然而此方法忽视了不同用户的兴趣权重差异性问题 ,因为物品表示中各用户的重要性值存在差异 。因此我们采用注意力机制来学习等式(8)中的注意权重\eta^{k+1}_{ia} ,如以下所示

在这里插入图片描述
复制代码
* where a MultiLayer Perceptrion (MLP) is used to learn the node attention weights with the related user and item embeddings at the $k$-th layer. After that, we normalize the attention weights with: (其中,多层感知器(MLP)用于学习节点注意权重,相关用户和项目嵌入在第$k$层。之后,我们用以下方法将注意力权重标准化:)  
在这里插入图片描述

The exponential function specifically guarantees that each attention weight remains above zero.

Each user a is assigned a latent representation vector, denoted as u^k_a, at the k-th layer. In addition to holding a significant position within both social networks GS and GI, each user's updated latent representation, denoted as u^{(k+1)}_a at the (k+1)-th layer, incorporates influences from two distinct graphs.

  • The influence diffusion within G_S
    • and the interest diffusion on G_I.
    • Let \hat{p}^{k+1}_a represents the aggregated embedding of influence diffusion from the social neighbors
    • while \hat{q}^{k+1}_a represents the aggregated embedding of interest diffusion from the interested item neighbors at the (k + 1)-th layer. Then, each user’s updated embedding, denoted as u^{k+1}_a, is modeled as follows.
在这里插入图片描述
  • When Eq.(12) is executed, each user updates her latent embedding through the fusion of influence diffusion aggregation \hat{p}^{k+1}_a and interest diffusion aggregation \hat{q}^{k+1}_a, alongside her own previous layer embedding u^k_a.
  • Given that each user exists in both the social graph and interest graph, Eq.(13) and Eq.(14) respectively model the influence diffusion aggregation from the social graph and interest diffusion aggregation from the interest graph.
  • Specifically, \alpha^{k+1}_{ab} represents the extent to which user b influences user a in social network layer (k + 1), while \beta^{k+1}_{ai} signifies the degree to which item i attracts user a in interest network layer (k + 1).

除了用户和物品嵌入之外,在上述三个方程中还存在三组权重参数。一种直观的想法就是直接设各类型权重相等即γ{(k+1)}_{a₁}等于γ{(k+1)}{a₂}等于½;α^{(k+1)}{ab}= ¹⁄|Sₐ|;β^{(k+1)}_{a i}= ¹⁄|Rₐ|.

However, this straightforward concept fails to adequately represent the diverse nature of weights within the user decision-making framework. To be precise, these three categories of weights are inherently structured as a two-layered hierarchical system.

Specifically, social influence strength and interest strength can be regarded as node-level weights, representing how each user maintains a balance among different neighboring nodes within a graph.

By aggregating node-level attention outputs and incorporating them into Eq.(12), \gamma^{k+1}_{al} represents a graph-level weight learned to integrate information across various graphs. Specifically, these graph layer weights are crucial because they capture how individual users balance their social influences with their personal historical data during user embedding. The usage patterns among users differ significantly; some exhibit strong responsiveness to social networks, whereas others maintain relatively stable interests outside of this influence. Therefore, it becomes essential to tailor these individualized weights within each user's graph attention mechanism to reflect their unique preferences.

Due to the three sets of weights corresponding to a hierarchical structure, we thus employ a multi-level attention mechanism to incorporate the attentive weights into our model.

Specifically, the graph attention network is designed to concretely speak about learning the contribution weight of each aspect by incorporating a's embeddings from diverse graphs, i.e., \hat{p}^{k+1}_a and \hat{q}^{k+1}_a in Eq.(12).

node attention networks are aimed at learning attentive weights in every social graph and every interest graph separately. Specifically speaking, the social influence score α^{k+1}_{ab} is computed through the following steps: (节点注意力网络分别用于在每个社会图和每个兴趣图中学习注意力权重。具体而言,在计算社会影响评分α^{k+1}_{ab}时会采用以下步骤:)

在这里插入图片描述

在上述方程中 社会影响强度α_{ab}^{k+1}接收相关的两个用户在第k层的嵌入作为输入 并将这些特征发送到MLP中以了解特征之间的复杂关系 从而进行社会影响强度学习 为了避免混淆 我们在下文中省略了所有注意力建模中的归一化步骤 因为它们都具有类似的形式 如图(11)所示

Likewise, we are calculating the interest influence score \beta^{k+1}_{ai} using related user representations and item features as inputs.

在这里插入图片描述

Once obtaining two groups of node attentive weights, their outputs are then passed to the graph attention network. We can model the resulting graph attention weights \gamma^{k+1}_{al} (where l=1,2) as follows.

在这里插入图片描述

(10) 在上述方程中,在上文中使用的公式中,在方程左边,在方程右边。

As an instance, as demonstrated in Equation (12), \gamma^{(k+1)}_{a1} represents the influence diffusion weight responsible for contributing to users' depth (k + 1) embedding, incorporating additionally the attention-based combination of influence diffusion aggregation from Equation (13).

Similarly, \gamma^{(k+1)}_{a2} represents the interest diffusion weight that contributes to users' depth (k+1) embedding, incorporating an additional input from the learned attentive combination of interest diffusion aggregation as per Eq.(14). (表示有助于用户深度(k+1)嵌入的兴趣扩散权重,并结合了公式(14)中学习到的兴趣扩散聚合的注意组合的额外输入。)
Due to γ^{k+1}_{a1} + \gamma^{k+1}_{a2} = 1, a greater \gamma^{k+1}_{a1} signifies a higher influence diffusion effect accompanied by a lesser interest diffusion effect. (由于γ^{k+1}_{a1} + \gamma^{k+1}_{a2} = 1成立,在这种情况下较大的\gamma^{k+1}_{a₁}意味着更高的影响扩散效应伴随较小的兴趣扩散效应。)
Consequently, these learned aspect importance scores are customized for each user, distinguishing between the importance of influence diffusion and interest diffusion effects during their embedding update process. (因此,在用户的嵌入更新过程中根据各自需求定制了这些学习出的方面重要性分数,从而区分了影响扩散效应与兴趣扩散效应的重要性差异。)

3.1.4 Prediction Layer.

  • (1) 经过重复的K层扩散过程后,在第k=0,1,2,...,K层中分别获得了用户和物品的嵌入表示u^k_av^k_i
  • 对于每个用户a, 她的最终嵌入表示被定义为:通过逐层拼接得到\bm{u}^*_a = [\bm{u}^0_a \| \bm{u}^1_a \| \dots \| \bm{u}^{K}_a]
  • 类似地,每个物品i的最终嵌入表示为:\bm{v}^*_i = [\bm{v}^0_i \| \bm{v}^{1}_i \| \dots \| \bm{v}^{K}_i]
  • 最后,在预测评分时将其建模为最终用户与物品嵌入之间的内积关系[7]
在这里插入图片描述
  • (2) Please note that, some previous works directly use the K-th layer embedding for prediction layer as \hat{r}_{ai} = [u^K_a]^T V^K_ i. (请注意,以前的一些工作直接使用K-th层嵌入作为预测层)
    • Recently , researchers found that if we use the K-th layer embedding, GCN based approaches are proven to over-smoothing issue as K increases [27], [52]. (最近,研究人员发现,如果我们使用第K层嵌入,基于GCN的方法被证明会随着K的增加而过度平滑[27],[52]。)
    • In this paper, to tackle the over-smoothing problem, we adopt the prediction layer as the LR-GCCF model, which receives state-of-the-art performance with user-item bipartite graph structure [7]. (在本文中,为了解决过度平滑问题,我们采用预测层作为LR-GCCF模型,该模型采用用户项二部图结构[7]获得最先进的性能。)
    • In LR-GCCF, Chen et al. carefully analyzed the simple concatenation of entity embedding at each layer is equivalent to residual preference learning, and why this simple operation could alleviate the over-smoothing issue [7]. (在LR-GCCF中,Chen等人仔细分析了实体嵌入在每一层的简单串联相当于残差偏好学习,以及为什么这种简单操作可以缓解过度平滑问题[7]。)

3.2 Model Training

It is commonly applied in the field of implicit feedback, where we employ a pairwise ranking-based loss function for optimization purposes, as detailed in reference [37].

在这里插入图片描述

其中R^+标识观测到的正样本集(即用户与物品之间的互动记录),而R^-代表未观测到的负样本集(从所有潜在的用户-物品配对中随机采样得到)。此外,
\sigma(x)被定义为sigmoid函数。
其中\Theta = [\Theta_1,\Theta_2]表示模型中的正则化参数集合,
其中\Theta_1=[P,Q]代表融合层以及多级注意力机制中的参数集合。
\Theta_2=[W_1,W_2,[MLP_i]_{i=1,2,3,4}]则表示融合层以及多级注意力机制中的参数集合。
其中损失函数的所有参数均为可微分变量。

All trainable parameters are initialized using a Gaussian distribution that has a mean of zero and a standard deviation of 0.01.

  • 除此之外,在卷积层中的每个嵌入维度大小均未特意调整。
    • 就多级注意力网络中的多个MLP而言,则采用了两层的结构安排。
    • 在实验部分中,则会对参数设置进行更为详细的描述。

3.3 Matrix Formulation of DiffNet++

The central concept of our developed DiffNet++ model lies in its carefully crafted interest and user influence diffusion pathways. Indeed, this particular aspect can be efficiently represented using matrix operations. Following this, we aim to illustrate how user and item embeddings can be updated from layer k to layer k+1 using matrix operations. Let H^{(k+1)} = [\eta^{k+1}_{ia}] \in R^{N\times M} denote the matrix representation of attentive item aggregation weights as defined in Eq.(10), it holds that:

在这里插入图片描述

(2) At the user side, given Eq.(12) ,

  • A^{(k+1)} = [\alpha_{ab}^{k+1}] \in R^{M\times M}
  • 其中B^{(k+1)} = [\beta_{ia}^{k+1}] \in R^{M\times N}被用来表示社交网络中的关注权重矩阵(如公式(13)所示)以及兴趣网络中的关注权重矩阵(如公式(14)所示),这些权重矩阵对应于节点关注层的输出结果。
  • 我们使用\Gamma_{al}^{(k+1)} = [\gamma_{al}^{k+1}] \in R^{M\times 2}来表示多级网络中的注意力权重矩阵(如公式(\ref{eq:multi_networks})所示)。与前面两部分的计算方式类似,在此不再赘述。

Upon learning the attention matrices, we can now update user and item embeddings at the (k + 1)-th layer as follows.

在这里插入图片描述

其中I₁是一个具有M行的单位矩阵,
I₂则是一个具有N行的单位矩阵。
此外,在矩阵Γ中使用Γ(:,1)和Γ(:,2)分别表示其第一列和第二列。
其中符号∗代表点积运算符,
并且rm(A, r₁)表示包含r₁个副本的数组在尺寸栏中的排列。

Taking into account the aforementioned matrix-based operations within the social network and influence propagation layers, DiffNet++ is straightforward to implement using existing deep learning frameworks.

3.4 Discussion

3.4.1 Space complexity.

  • As shown in Eq.(20), the model parameters are composed of two parts:
    • the user and item free embeddings \Theta_1=[P, Q], (用户和项目自由嵌入)
    • and the parameter set in the fusion layer and the attention modeling, i.e., \Theta_2 = [W_1, W_2, [MLP_i]_{i=1,2,3,4}]. (融合层参数设置与注意建模)
    • Since most embedding based models (e.g., BPR [37], FM [36]) need to store the embeddings of each user and each item, (由于大多数基于嵌入的模型(例如BPR[37],FM[36])需要存储每个用户和每个项目的嵌入,)
    • the space complexity of \Theta_1 is the same as classical embedding based models and grows linearly with users and items. (\Theta_1的空间复杂性​与经典的基于嵌入的模型相同,并随用户和项目线性增长)
    • For parameters in \Theta_2, they are shared among all users and items, with the dimension of each parameter is far less than the number of users and items. (对于\Theta_2中的参数​,它们在所有用户和项目之间共享,每个参数的维度远小于用户和项目的数量。)
    • In practice, we empirically find the two-layer MLP achieve the best performance. (在实践中,我们经验发现两层MLP实现了最佳性能。)
    • As such, this additional storage cost is a small constant that could be neglected. Therefore, the space complexity of DiffNet++ is the same as classical embedding models. (因此,这种额外的存储成本是一个可以忽略的小常数。因此,DiffNet++的空间复杂度与经典嵌入模型相同。)

3.4.2 Time complexity.

  • Compared to the classical matrix factorization based models , the additional time cost lies in the influence and interest diffusion layers. (与经典的基于矩阵分解的模型相比,额外的时间成本在于影响层和利益扩散层。)
    • Given M users, N items and diffusion depth K, suppose each user directly connects to L_s users and L_i items on average, and each item directly connects to L_u users.
    • At each influence and interest diffusion layer, we need to first calculate the two-level attention weight matrices as shown in Eq.(21), and then update user and item embeddings.
    • Since in practice, MLP layers are very small (e.g., two layers), the time cost for attention modeling is about O(M(L_s + L_i)D + N L_u D). After that, as shown in Eq.(24), the user and item update step also costs O(M(L_s+L_i)D+N L_uD).
    • Since there are K diffusion layers, the total additional time complexity for influence and interest diffusion layers are O(K(M(L_s + L_i) +N L_u)D).
    • In practice, as L_s, L_i, L_u \ll min\{M, N\}, the additional time is linear with users and items, and grows linearly with
      diffusion depth K. Therefore, the total time complexity is acceptable in practice.

3.4.3 Model Generalization.

The proposed DiffNet++ model has been developed within a problem-solving framework, given as inputs the user feature matrix X, item feature matrix Y, and social network S.

  • Specifically, the fusion layer employs user (item) feature vectors to facilitate user (item) representation learning. (具体而言,在用户(项目)表示学习过程中,融合层采用用户的(项目)特征向量进行辅助。)
  • The layer-wise diffusion mechanism employs social network structure S and interest network structure R to simulate the dynamics of how users' latent preferences are influenced through recursive processes involving both social and interest networks. (在分层扩散机制中,则通过社会网络结构S和兴趣网络结构R来模拟用户潜在偏好受递归影响与兴趣扩散过程共同作用下的动态演化过程。)
  • Moving forward, we will demonstrate that our proposed model remains applicable across diverse scenarios where alternative data inputs are unavailable. (接下来,我们将证明无论是在数据输入受限的情况下还是其他场景中,我们提出的模型都具有良好的适用性。)

(2) 当用户的(项目)属性不存在时,融合层不再存在。换句话说,在等式(7)中所示,每个项目的潜在嵌入v^0_i缩减为q_i

  • Likewise, any user's initial layer-0 latent embedding u^0 = p_a (Eq.(6)).
  • Likewise, when either the user attributes or the item attributes lack existence, the corresponding fusion layer for users or items degrades.

4 EXPERIMENTS

4.0.1 Datasets.

在这里插入图片描述

We carry out experiments on four real-life data sets among the widely-used platforms: Yelp, Dianping, Flickr, and Epinions.

(2) Yelp is a famous online geospatial social network, where users can make friends with others and review restaurants. We use the Yelp dataset which is publicly available2.

Flickr3 是一个基于在线图片的社交分享平台。我们在本文中采用了来自[42]的研究者们爬取并公开发布的社交图片推荐数据集(其中包含了社交网络结构和用户对图片的评分记录)。Epinions 是一个基于社交的产品评论平台,并且其产品评论数据集在[31]中介绍,并可公开获取4。Dianping 是中国规模最大的基于地理位置的社会网络平台,并采用了一个来自[26]研究者们收集的数据集合(该数据集也可公开获取5)。

Within the four datasets, Yelp and Flickr represent two sets of data that include both user and item attributes, which serve as the foundation for our previously developed DiffNet model [43].

复制代码
* The remaining two datasets of Epinions and Dianping do not contain user and item attributes. We use the same preprocessing steps of the four datasets. (Epinions和Dianping的其余两个数据集不包含用户和项目属性。我们对这四个数据集使用相同的预处理步骤。)
* Specifically, as the original ratings are presented with detailed values, we transform the original scores to binary values. (具体来说,由于原始评分以详细值呈现,我们将原始评分转换为二进制值。)
* If the rating value is larger than 3, we transform it into 1, otherwise it equals 0. (如果评级值大于3,我们将其转换为1,否则等于0。) 
* For both datasets, we filter out users that have less than 2 rating records and 2 social links and remove items which have been rated less than 2 times. (对于这两个数据集,我们都会筛选出评分记录和社交链接少于2次的用户,并删除评分少于2次的项目)
* We randomly select 10% of the data for the test. In the remaining 90% data, to tune the parameters, we select 10% from the training data as the validation set.
* We show an overview of the characteristics of the four datasets in Table 1. In this table, the last line shows whether the additional user and item attributes are available on this dataset. (我们在表1中概述了这四个数据集的特征。在该表中,最后一行显示了附加的用户和项属性是否在此数据集上可用。)  
在这里插入图片描述

4.0.2 Baselines and Evaluation Metrics.

  • 为了展示我们方法的有效性,请与竞争基准线(包括经典的协同过滤模型BPR[37]和FM[36]等)进行比较。
  • 我们将包含用户与项目的图以及用户的特征和项目的特征作为输入,并将其转换为推荐任务。
  • 对于我们的DiffNet[43]和DiffNet++模型而言,在删除用户与项目特征后可以简化为更简单的版本。为此,在不考虑用户与项目功能时我们将它们分别表示为DiffNet-nf与DiffNet+±nf。
  • 从本文可以看出在社会推荐方面仅考虑高阶社会影响与高阶兴趣网络的只有我们提出的DIFNET+±NF与DeffNET++两种模型。
在这里插入图片描述
在这里插入图片描述
  • (2) 对于top-N排序评估指标的选择与应用,请参考以下两个常用指标:[9]中的HIT RATIO(HR)与[9]、[43]中的NORMALIZED DISCOUNTED CUMMULATIVE GAIN(NDCG)。具体而言,在这种情况下:
    • HR指标代表的是在top-N列表中命中项目的比例;
    • 而NDCG则更加重视排在前列的项目。
    • 由于我们关注的是在大规模项目集中进行top-N排序性能评估这一目标,并且这一方法与许多其他研究方法相似[17]、[43]。
    • 因此,在评估性能时,
    • 我们会为每个用户随机抽取1000个未参与过交互但未评分的项目作为负样本,
    • 这些负样本将与实际测试集中对应的正样本结合使用,
    • 最后从中选出top-N潜在候选。
    • 为了减少这个过程中的不确定性,
    • 我们会重复上述步骤5次,
    • 并报告平均结果。
在这里插入图片描述
在这里插入图片描述

4.0.3 Parameter Setting.

4.1 Overall Performance Comparison

  • 我们通过实验数据分析展示了不同嵌入维度D在top-10推荐任务中的整体性能表现(Table 3至Table6)。
  • 在表3中展示的结果表明,在Yelp和Flickr平台上的节点属性值具有可获取性。
  • 表4中的结果对比显示,在Epinions和Dianping平台上的无属性值模型表现出了更好的推广能力。
  • 我们注意到除了基于Pairwise Ranking的方法BPR外(BPR),其他几乎所有模型随着维度D的增加都展现出了更强的推荐性能提升效果(BPR仅利用观测到的用户-物品评分矩阵进行推荐,并面临数据稀疏问题)。
  • TrustSVD和SocialMF方法通过引入社交邻居信息辅助缓解了这一问题。
  • GraphRec进一步改进了这些传统社交推荐模型的表现(通过在用户嵌入学习过程中同时考虑了一阶社交邻居与兴趣相关联的信息)。
  • 然而需要注意的是GraphRec仅建模了两图的一阶社会关系网络结构(忽视了高阶图结构)。
  • 对于基于GCN的模型而言(如PinSage与NGCF),它们通过建模高阶用户的项图结构实现了显著的性能提升(DiffNet则专注于高阶社会结构建模)。
  • 这些基于图神经网络的方法在很大程度上超越了传统的矩阵基线方法(表明高阶图结构对于推荐系统的有效性)。
  • 我们的改进型DiffNet++模型无论是在哪一维数D下都能取得最优表现(这表明递归扩散过程在社会利益网络建模方面的有效性)。
  • 此外我们发现DiffNet++与DiffNet相较于不采用用户特征与项目特征建模的传统方法均能表现出更优的性能效果(这证明了在融合层中同时注入特征嵌入与潜在表示的有效性)。
  • 最后我们通过对不同模型在不同top-N值下的实验对比进一步验证了这一结论并总结出相同的趋势性规律(因此我们可以从实验结果上得出我们提出的方法具有显著的优势)。为了便于后续分析我们将主要关注于当设置为64时各模型的表现情况

4.2 Performance Under Different Sparsity

  • In this part, we would like to investigate how different models perform under different rating sparsity . (在这一部分中,我们将研究不同模型在不同评级稀疏度下的表现。)
  • Specifically, we first group users into different interest groups based on the number of observed ratings of each user. E.g, [8,16) means each user has at least 8 rating records and less than 16 rating records. (具体来说,我们首先根据每个用户观察到的评分数将用户分为不同的兴趣组。例如,[8,16]表示每个用户至少有8个评级记录,但少于16个评级记录。)
  • Then, we calculate the average performance of each interest group. The sparsity analysis on Yelp dataset and Flickr dataset are shown in Fig. 3(a) and Fig. 3(b) respectively . (然后,我们计算每个利益集团的平均绩效。Yelp数据集和Flickr数据集的稀疏性分析分别如图3(a)和图3(b)所示。)
  • From both datasets, we observe that as users have more ratings, the overall performance increases among all models. This is quite reasonable as all models could have more user behavior data for user embedding modeling. (从这两个数据集中,我们观察到,随着用户评分的增加,所有车型的整体性能都会提高。这是非常合理的,因为所有模型都可以有更多的用户行为数据用于用户嵌入建模。)
  • Our proposed models consistently improve all baselines, and especially show larger improvements on sparser dataset. E.g., when users have less than 8 rating records, DiffNet++ improves 22.4% and 45.0% over the best baseline on Yelp and Flickr respectively (我们提出的模型持续改进所有基线,尤其是在稀疏数据集上表现出更大的改进。例如,当用户的评分记录少于8次时,DiffNet++在Yelp和Flickr上分别比最佳基线提高了22.4%和45.0%)
在这里插入图片描述

4.3 Detailed Model Analysis

4.3.1 Diffusion Depth K.

  • K层的数量至关重要,因为它决定了不同图形的扩散深度。
    我们在表7中展示了两个具有属性的数据集的不同K值的结果。“改进”一栏显示了与最佳设置相比的性能变化(即K层的数量至关重要)。
  • K=2时,在K从0增加到1的过程中(此时DiffNet++退化为BPR),性能迅速提升至最佳。
    但是当将层增加到3时(即更高的层数),性能开始下降。我们通过实证分析得出结论:仅使用2跳高阶社会兴趣图结构就足以满足社会推荐的需求。
在这里插入图片描述

4.3.2 The Effects of Multi-level Attention.

  • A central feature of our proposed model is the hierarchical attention mechanism achieved by integrating social networks and interest networks for recommendation. In this subsection, we examine the impacts of diverse attention mechanisms. We demonstrate the outcomes of multiple attention modeling approaches in Table 8, (Our proposed model distinguishes itself through its ability to integrate social and interest networks, enabling hierarchical attention mechanisms for effective recommendations. Within this section, we explore how various attention mechanisms influence performance. As shown in Table 8, different modeling approaches yield distinct outcomes.)

With "AVG" meaning we directly set uniform attention weights without undergoing any attention learning process. As observed in this table, both node-level and graph-level attention mechanisms can enhance recommendation outcomes. Notably, graph-level attention shows a more significant improvement compared to AVG. Incorporating node-level attention further amplifies these gains. For instance, on average, adding node-level features increases performance by approximately 15-20%. However, these improvements vary across datasets; Yelp's results are less impactful compared to those from Flickr. This observation suggests that while considering element importance strengths can enhance model performance to varying degrees, our proposed multi-level approach remains adaptable to diverse data characteristics.

在这里插入图片描述

4.3.3 Attention Value Analysis.

  • For each user a at layer k, the graph level attention weights of \gamma^k_{a1} and \gamma^k_{a2} denote the social influence diffusion weight and the interest diffusion weight.
  • A larger value of \gamma^k_{a1} indicates the social influence diffusion process is more important to capture the user embedding learning with less influence from the interest network. (γ^k_{a1}的较大值​表明社会影响扩散过程对于捕获用户嵌入学习更为重要,而兴趣网络的影响较小。)
  • In Table 9, we show the learned mean and variance of all users’ attention weights at the graph level at each layer k. Since both datasets receive the best performance at K= 2, we show the attention weights at the first diffusion layer (k = 1) and the second diffusion layer k = 2. (在表9中,我们展示了在每一层k的图形级别上所有用户注意力权重的学习均值和方差。由于这两个数据集在K=2时的性能最好,我们显示了第一个扩散层(k=1)和第二个扩散层k=2的注意权重。)
  • There are several interesting findings.
    • First, we observe for both datasets, at the first diffusion layer with k = 1, the average value of the social influence strength \gamma^1_{a1} are very high, indicating the first-order social neighbors play a very important role in representing each user’s first layer representation. This is quite reasonable as users’ rating behavior are very sparse, and leveraging the first order social neighbors could largely improve the recommendation performance. (首先,我们观察两个数据集,在k=1的第一扩散层,社会影响强度的平均值\gamma^1_{a1}非常高,表明一阶社交邻居在表示每个用户的第一层表示中起着非常重要的作用。这是非常合理的,因为用户的评级行为非常稀疏,利用一阶社交邻居可以大大提高推荐性能。)
    • When k= 2, the average social influence strength \gamma^2_{a1} varies among the two datasets, with the Yelp dataset shows a larger average social influence weight, while the Flickr dataset shows a larger interest influence weight with quite small value of average social influence weight. (当k=2时,平均社会影响强度γ^2_{a1}这两个数据集各不相同,Yelp数据集显示出更大的平均社会影响权重,而Flickr数据集显示出更大的兴趣影响权重,平均社会影响权重的值非常小。)
    • We guess a possible reason is that, as shown in Table 1, Flickr dataset shows denser social links compared to the Yelp dataset, with a considerable amount of directed social links at the first diffusion layer, the average weight of the second layer social neighbors decreases. (我们猜测一个可能的原因是,如表1所示,与Yelp数据集相比,Flickr数据集显示了更密集的社交链接,在第一扩散层有大量定向社交链接,第二层社交邻居的平均权重降低。)
在这里插入图片描述

4.3.4 Runtime.

  • In Table 10, we show the runtime of each model on four datasets. Among the four datastes, Epinions and Dianping do not have any attribute information. (在表10中,我们展示了四个数据集上每个模型的运行时间。在这四个数据中,Epinions和Dianping没有任何属性信息。)
  • For fair comparison, we perform experiments on a same server. The server has an Intel i9 CPU, 2 Titan RTX 24G, and 64G memory . The classical BPR model costs the least time, followed by the shallow latent factor based social recommendation models of SocailMF and TrustSVD. (为了公平比较,我们在同一台服务器上进行了实验。该服务器有一个Intel i9 CPU、2个Titan RTX 24G和64G内存。经典的BPR模型花费的时间最少,其次是基于浅层潜在因素的社会推荐模型SocailMF和TrustSVD。)
  • CNSR has longer runtime as it needs to update both the user embedding learned from user-item behavior, as well as the social embedding learned from user-user behavior. The graph based models cost more time than classical models. (CNSR的运行时间更长,因为它需要更新从用户项行为中学习到的用户嵌入,以及从用户行为中学习到的社交嵌入。基于图形的模型比经典模型花费更多的时间。)
  • Specifically, NGCF and DiffNet have similar time complexity as they capture either the interest diffusion or influence diffusion. By injecting both the interest diffusion and influence diffusion process, DiffNet++ costs more time than these two neural graph models. (具体而言,NGCF和DiffNet具有相似的时间复杂度,因为它们捕获的是兴趣扩散或影响扩散。通过注入兴趣扩散和影响扩散过程,DiffNet++比这两种神经图模型花费更多的时间。)
  • GraphRec costs the most time on the two datasets without attributes. The reason is that, though GraphRec only considers one-hop graph structure, it adopts a deep neural architecture for modeling the complex interactions between users and items. As we need to use the deep neural architecture for each user-item rating record, GraphRec costs more time than the inner-product based prediction function in DiffNet++. (GraphRec在这两个没有属性的数据集上花费的时间最多。原因是,尽管GraphRec只考虑单跳图结构,但它采用了一种深层的神经结构来建模用户和项目之间的复杂交互。由于我们需要为每个用户项目评级记录使用深层神经结构,GraphRec比DiffNet++中基于内积的预测函数花费更多时间。)
  • On Yelp and Flickr, these two datasets have attribute information as input, and the DiffNet++ model needs the fusion layer to fuse attribute and free embeddings, while GraphRec does not have any attribute fusion. (在Yelp和Flickr上,这两个数据集都有属性信息作为输入,DiffNet++模型需要融合层来融合属性和自由嵌入,而GraphRec没有任何属性融合。)
  • Therefore, DiffNet++ costs more time than GraphRec on the two datasets with attributes. The average training time of DiffNet++ on the largest Dianping dataset is about 25 seconds for one epoch, and it usually takes less than 100 epoches to reach convergence. Therefore, the total runtime of DiffNet++ is less than 1 hour on the largest dataset, which is also very time efficient. (因此,在具有属性的两个数据集上,DiffNet++比GraphRec花费更多的时间。DiffNet++在最大的Dianping数据集上的平均训练时间约为25秒(一个epoch),通常需要不到100个epoch才能收敛。因此,在最大的数据集上,DiffNet++的总运行时间不到1小时,这也非常节省时间。)
在这里插入图片描述

5 CONCLUSIONS AND FUTURE WORK

In this paper, the authors introduced a neural social and interest diffusion based model, designated as DiffNet++, for social recommendation. (他们提出了一种基于神经社会和兴趣扩散的社会推荐模型,命名为DiffNet++。)
We posited that, since users hold central roles in both social and interest networks, integrating the higher-order structures of these networks would mutually benefit each other. (我们主张,在社交网络和兴趣网络中用户扮演着核心角色,在整合这两个网络的高阶结构时会相辅相成。)
By modeling social recommendation as a heterogeneous graph, we recursively learned user embeddings from convolutions across both user social neighbors and interest neighbors, thereby directly incorporating high-order social structures and interest networks into the user modeling process. (通过将社会推荐建模为一个异构图,在用户社会邻居和兴趣邻居之间进行了卷积运算以学习用户的嵌入表示,并在用户建模过程中直接注入高阶社会结构与兴趣网络。)
Furthermore, we developed a multi-level attention network to attentively aggregate graph and node level representations for improved user modeling. (此外,我们设计了一个多层次注意力机制来集中图形与节点级别的表示以提升用户的建模效果。)
Experimental results on two real-world datasets clearly demonstrated the effectiveness of our proposed model. In the future, we plan to explore graph reasoning models to better explain users' behaviors.

ACKNOWLEDGEMENTS

REFERENCES

全部评论 (0)

还没有任何评论哟~