2018_WWW_Dual Graph Convolutional Networks for Graph-Based Semi-Supervised Classification
[论文阅读笔记]2018_WWW_Dual Graph Convolutional Networks for Graph-Based Semi-Supervised Classification—(The World Wide Web Conference, 2018.04.23)-- Chenyi Zhuang, Qiang Ma
论文下载地址
发表期刊
发布日期
作者单位
数据集信息如下
-
3 个citation network datasets
- Citeseer
- Cora
- Pubmed
-
1个knowledge graph dataset
- NELL
-
Simplified NELL(作者自己根据NELL改的)
代码:
- Deep Graph Convolutional Networks: https://github.com/ZhuangCY/Coding-NN
- Graph Convolutional Network: https://github.com/tkipf/gcn
- Planetoid: https://github.com/kimiyoung/planetoid
- Deep Walk: https://github.com/phanein/deepwalk
其他人写的文章
-
对《Dual Graph Convolutional Networks for Graph-Based Semi-Supervised Classification》的理解
-
关于Dual Graph Convolutional Networks的研究综述
-
Pointwise Mutual Information (PMI)及其正向版本Positive PMI(PPMI)的概念解析
-
双图卷积网络在图基尔半监督分类中的应用研究解析
-
双图卷积网络架构设计与性能评估分析
-
点互信息(PMI)与正点互信息(PPMI)的基本原理及计算方法探讨
-
点互信息(PMI)是用来衡量词语之间语义关联性的指标之一
-
正点互信息(PPMI)是对原始PMI值进行过滤后的一种改进形式
-
这两种指标常用于文本挖掘和机器学习任务中的特征提取过程
简要概括创新点:(有很多实现的细节,值得去读)

- (1) In this paper, we have proposed a Dual Graph Convolutional Network method for graph-based semi-supervised learning. (我们提出了一种基于图的半监督学习的对偶图卷积网络方法)
- (2) Beyond the commonly considered local consistency, our method employs PPMI to encode global consistency. (除了经常考虑的局部一致性之外, 我们的方法还使用PPMI对全局一致性进行编码)
- (3) Our work provides a solution for integrating prior knowledge learned from different data views. (我们的工作为结合从不同数据视图中获得的先验知识提供了一种解决方案)
- (4) Conv_A represents prior work, while Conv_P represents the innovation in this paper.
- (5) To seal the gap between two components, we introduced an author-designed Loss function.
- (6) Drawing upon PPMI from NLP, we integrated it meticulously. To ensure seamless integration, we developed numerous details such as using Random Walk to obtain frequency matrix F, and deriving matrix P from F.
- (7) The adjacency matrix A and matrix P fundamentally represent diffusion processes.
- (8) We meticulously defined the context and semantic information to form a coherent framework.
Abstract
We introduce a straightforward and efficient semi-supervised learning approach for graph-based datasets, where only a minimal fraction of the training examples have labels assigned.
Especially, a dual graph convolutional neural network (双图卷积神经网络) approach is formulated to simultaneously address the two fundamental premises of semi-supervised learning.
- (1) 局部一致性(这部分是前人已有的成果)
- 和 全局一致性。(本文的重点所在)
Accordingly, two convolutional neural networks have been developed to represent both local and global consistency-based knowledge.
由于两网络的数据变换差异较大,在此之后我们引入了一个无监督的时间损失函数(为了缝合两部分神经网络而设计)用于集成。
KeyWords
- Graph convolutional networks,
- 基于部分监督的学习,
- 图扩散过程,
- 邻接矩阵,
- 点态互信息
1 Introduction
(第一部分:前人已经做的工作)
In this paper, we introduce a new general semi-supervised learning algorithm that is applicable to various types of graphs.
Typically, a graph-based semi-supervised learning framework can be framed through the following loss function.

where \mathcal{L}_0 denotes the supervised loss with respect to the labeled data and \mathcal{L}_{reg} denotes the regularizer with respect to the graph structure
By employing an explicit graph-based regularizer \mathcal{L}_{reg}, Eq. (1) aggregates the label information from \mathcal{L}_0 across the entire graph, effectively smoothing it over the network structure.
- 例如,在\mathcal{L}_{reg}中通常会引入一个图拉普拉斯正则化项[3][5][33](如文献[4]所述),该正则化项基于假设:图中连通节点很可能具有相同的标签。
- 然而,这种假设可能会限制建模能力(如文献[4]所述),因为图边可能不仅编码节点相似性(如文献[6]所述),还可能包含其他重要信息。
Among Eq. (2), rather than incorporating an explicit graph-based regularizer into the loss function, a convolutional function Conv is developed to directly encode the graph topology information.

Approximately speaking, the structural encoding of Conv can be carried out across two distinct domains.
* (i) the graph vertex domain [2] [32]; and
* (ii) the graph spectral domain
the structure encoding Conv
the graph vertex domain
the graph spectral domain
- (6) 前人的不足----->没有考虑global consistency
However, by using Eqs. (1) and (2) , most of the related work have only considered the local consistency of a graph for knowledge embedding. To sufficiently embed the graph knowledge, we find that the global consistency of a graph has not been well investigated yet
(第二部分:在前人的基础上,本文做的工作和创新)
- (7) Hence, in this paper, we propose a dual graph convolutional neural network method to jointly take both of them into consideration.
The form of the loss function in the proposed strategy is

- (8) By leveraging both adjacency matrices and PPMI matrices, these convolutional networks represent both local and global structural information.
- (9) Corresponding to the two essential assumptions in semi-supervised learning [32],
- ConvA encodes local consistency knowledge (for instance, nearby data points are likely to share identical labels).
- ConvP encodes global consistency knowledge (for instance, data points sharing similar contexts typically receive identical labels).
2 assumption in semi-supervised learning
ConvA embeds the local-consistency-based knowledge
i.e., nearby data points are likely to have the same label
ConvP embeds the global-consistency-based knowledge
i.e., data points that occur in similar contexts tend to have the same label
- (10) It is then utilized in supervised learning through either Conv_A or Conv_P, as exemplified by \mathcal{L}_0(Conv_A).
- (11) Nevertheless, to enhance predictive performance, an ensemble-oriented regularizer \mathcal{L}_{reg}(Conv_A, Conv_P) for these transformations has been derived. By minimizing discrepancies in predictions from different transformations of an input sample, this regularizer integrates perspectives from both Conv_A and Conv_P. Consequently, \mathcal{L}_{reg}(Conv_A, Conv_P) constitutes an unsupervised loss function.
(第三部分:本文Contribution的一个Summary)
除了基于图邻接矩阵的卷积ConvA之外,
我们提出了一种新的卷积神经网络Conv_P,
其基础是依赖于正点互信息矩阵的PPMI。
与之不同的是,
我们采用随机游走的方法来构建一个PPMI矩阵,
从而进一步嵌入语义信息(即全局一致性知识)。
- (13) In addition to the supervised learning on a small portion of labeled training data (i.e., L_0 in Eq. (3)), (除了对一小部分标注的训练数据(即式(3)中的L0)进行有监督学习外)
- an unsupervised loss function (i.e., \mathcal{L}_{reg} in Eq. (3)) is proposed as a kind of regularizer to combine the output of different convolved data transformations.
2 Background to Graph-based Semi-supervised Learning Model: A Review of Its Applications and Implications
-
半监督学习旨在处理仅有一部分数据被标注的情况。
-
给定一个数据集X = \{x_1, ..., x_l , x_{l + 1}, ..., x_n\}
-
和一个类别集合C = \{1, ...,c\},
-
前l个样本具有标签\{y_1, ...,yl \} \subset C且其余样本为未标记样本。
-
其目标是推断未标记样本的类别。
- (2) In addition to the labeled and unlabeled points, graph-based semisupervised learning
- also involves a given graph, denoted as an n \times n matrix A. Each entry a_{i,j} \in A indicates the similarity between data points x_i and x_j
- the similarity can be derived by calculating the distances among data points [33], or may be explicitly given by structured data, such as knowledge graphs [29], citation graphs [14], hyperlinks between documents [20], and so on
- (2) In addition to the labeled and unlabeled points, graph-based semisupervised learning
-
(3) 该问题的关键在于如何将图中的额外信息嵌入到模型中以提高标签预测的效果。
-
(4) 在近似分析下,我们将不同的图知识嵌入方法分为两类:即显式和隐式的基于图的半监督学习。
knowledge embeddings
explicit graph-based semi-supervised learning
implicit graph-based semi-supervised leraning
2.1 Explicit Graph-based Semi-Supervised Learning
Systematic graph-based semi-supervised approaches utilize a graph-based regularizer, specifically \mathcal{L}_{reg} as defined in Equation (1), to integrate information from the graph structure.

- Eq. (4) represents an example of the graph Laplacian regularizer, whereas
- f(·) denotes the label predicting function, for example, a neural network.
- The unnormalized graph Laplacian is equal to A - D, where
- Each entry in the diagonal matrix D is equal to \sum_j a_{i,j}.
(2) 前人的相似的工作
- A label spreading algorithm modeled upon Gaussian Random Fields.
- An analogous-to-PageRank algorithm was introduced to account for both local and global consistency within the graph.
- A sampling-oriented approach was developed instead of employing the graph Laplacian ∆; instead, they constructed a random walk-based sampling mechanism to acquire positive and negative contexts for each data point.
- A feed-forward neural network framework was subsequently employed for knowledge embedding.
2.2 Implicit Graph-Based Semi-Supervised Learning
2.2.1 Convolution in the vertex domain(点域的卷积)
Typically, a data point xi will undergo transformation through a process of diffusion. An illustrative instance of such a k-hop localized linear transform is represented by the equation below.

where b_{i,j} serve as weight coefficients for filtering purposes, and \mathcal{N}(i,k) represents the collection of nodes connected to x_i through paths involving up to k edges.
- (2) 这种卷积基于扩散核的概念构建而成
- diffusion-based convolution refers to a method where the operation leverages the properties of diffusion processes to aggregate information across nodes in a graph structure.
2.2.2 Convolution in the spectral domain(谱域的卷积)
我们首先关注标量x_i的最简单情况。
在这种情况下,在图结构上定义的输入变量X\in R^{n\times 1}可以被建模为一种信号,在这种情况下包含n个节点。
As shown in Eq. (6), the spectral convolution on a graph can be formulated as an operation within the framework of graph Fourier analysis, which involves multiplying the signal X with a filter g_{\theta} = \text{diag}(\theta) parameterized by \theta \in \mathbb{R}^n.

- wherein U represents a collection of eigenvectors associated with the normalized Laplacian matrix \bigtriangleup = I_n - D^{-\frac{1}{2}} AD^{-\frac{1}{2}}, which acts as a Fourier transform.
- When a data point x_i possesses multiple features, it can be considered as a signal composed of multiple input channels.
- Eq.(6) is computationally intensive for large-scale graph structures.
2.2.3 Realtion between the two domains
- (3) 当滤波器函数 g_{\theta} 近似为一个 k 阶多项式时,在频谱域中的卷积操作等价于进行 k 跳扩散。
- (4) 因此,在将滤波器函数 g_{\theta} 近似于一阶切比雪夫多项式后,在经过一系列推导过程得到的结果等价于仅进行一跳扩散。
- (5) 此外,在基于邻接矩阵的卷积之外,在整个卷积过程中我们计算了一个PPMI矩阵以用于提取语义信息。
3 Dual Graph Convolutional Networks(本文设计的模型)
3.1 Problem Definition and an Example
-
(1) The input of our model comprises
- a set of data points \mathcal{X} = \{x_1, \dots, x_l, x_{l+1}, \dots, x_n\}.
- The corresponding labels \{y_1,\dots,y_l\} for these points,
- and also includes the graph structure.
-
(2) Under the condition that each data point does not exceed k features, the dataset can be formally represented as a matrix X \in \mathbb{R}^{n \times k}.
-
(3) The graph topology is characterized by an adjacency matrix A \in \mathbb{R}^{n \times n}.


3.2 Local Consistency Convolution: Conv_A
3.2 这一部分属于前人研究的基础工作,在此阶段需要识别问题的关键所在,并构建解决问题的方法论。这些方法论即本论文的核心创新点(即3.3)。
We define a graph-based convolution, referred to as Conv_A, to model relationships between nodes in a graph structure. Specifically, given an input feature matrix \bm{X} and an adjacency matrix \bm{A}, we compute a transformation that captures interactions across edges in \bm{A}. The output from the i-th hidden layer, denoted as \bm{Z}^{(i)}, is computed as follows:

- 其中\tilde{A}等于A加上单位矩阵I_n(其中I_n\in R^{n\times n}),它代表了带有自环的邻接矩阵;同时,在第i,i位置上(即主对角线元素)的值等于\tilde{A}中该行所有元素之和。
- 随后地计算得到的是归一化后的邻接矩阵。
- 这代表了第(i-1)层输出的结果。
- 而可训练参数由方程组给出:
Z^{0}=X.W^{(i)}
- 激活函数如ReLU或Sigmoid。
其在**Eq. (7)**中的作用是精确地在一个层次上执行扩散过程。即通过将所有相邻节点的特征向量线性相加来丰富节点自身的特征向量
开始讲故事了
-
(3) His discovery served as inspiration for the proposed concept. Specifically, this method can be enhanced by minimizing inconsistencies in a semi-supervised framework: nearby data points are likely to share identical labels.
-
(4) For instance, examining Figure 1a, we observe that directly connected data points x_8 and x_{30} exhibit differing labels. Consequently, their convolved feature vectors should exhibit dissimilar characteristics. However, Eq. (7) fails to address such exceptions in an effective manner.
- (5) From the visualized results in Figure 1d , as expected, x_8 and x_{30} are close together. However, they belong to different groups. To verify the proposed concept, we manually delete the edge between x_8 and x_{30}, i.e., setting A[8, 30] = A[30, 8] = 0. As a result, Figure 1e presents the new t-SNE distribution of all 34 data points, where x_8 and x_{30} are far apart. Hence, the attendant problem is how to automatically reduce the number of such exceptions.(作者猜想,x_8 and x_{30}之间的边如果删掉,结果就会好。手动删除后,发现真的会好。那作者就要想办法,设计一种Model或者mechanism让程序或者method自动地删除这种情况)
- (6) In the next subsection, we introduce a PPMI-based convolution method.(很自然的引出自己设计的method,开始讲创新点)
- By encoding semantic information, this method allows different latent representations to be learnt for each data point.(通过对语义信息进行编码,该方法可以为每个数据点学习不同的潜在表示)
3.3 Global Consistency Convolution: Conv_P
(本文的最大创新点)
- (1) Besides defining graph structural information through the adjacency matrix A, we further utilize PPMI to encode semantic information, represented as matrix P\in R^{n\times n}.
- (2) By employing a random walk approach (which we designed ourselves), we first compute a frequency matrix F
- Based on F, we then compute P and explain how it leverages frequency information for semantic understanding. (Our computational workflow: First, using random walks to obtain F, then deriving P from F.)

The frequency matrix F is computed as the initial step within the innovative model proposed in our paper, where Random Walk serves as the starting point of the algorithm workflow.
定义为随机行走的是一个马尔可夫链,在此链中被随机漫游者访问的节点序列
- Upon being located at node x_i at time step t , we establish the state as s(t) = x_i .
- The likelihood that a walker transitions from node x_i to a neighboring node x_j is denoted by p(s(t + 1) = x_j | s(t) = x_i) .
(2) In our problem setting, given the adjacency matrix A, we assign:

(3) Algorithm 1 (阐述一些特点)
该算法的时间复杂度为O(n\gamma q^2);
当参数\gamma和q为较小的整数时,F计算起来非常快速。
此外,在同一张图的不同部分同时执行多个随机游走可以实现并行化。
(4) Random walks have been used (Random Walk 前人的一些应用)
- 作为一种相似性衡量方法,在推荐问题中的多种应用领域中引用了[11]。
- 图分类问题[1]。
- 以及半监督学习[30]。
- 在我们的方法中,我们利用随机游走算法来计算节点之间的语义相似性(如前所述)。
3.3.2 Calculating PPMI
(1) After calculating the frequency matrix F,
Denoted as a row vector, the i-th row in matrix F is represented by F_{i,:}. Similarly, each column of matrix F, specifically F_{:, j}, represents a column vector corresponding to feature f_j. Additionally, each F_{i,:} corresponds uniquely to node x_i. Moreover, each F_{:, j} maps uniquely to a context cj (as defined in our article).
(2) Based on Algorithm 1,
- 这些上下文被定义为\mathcal{X}中的所有节点。
- 矩阵F中的每个元素F_{i,j}的值等于x_i在上下文c_j中出现的次数。
(3) Based on F, we calculate the PPMI matrix P\in R^{n\times n} as:

Through the application of Eq. (9), the semantic information is encoded in P.
- denoted as p_{i,j} 是节点 x_i 在上下文 c_j 中出现的概率;
- p_{i,*} 表示节点 x_i 的估计概率;
- 以及 p_{*,j} 表示上下文 c_j 的估计概率;
- 根据统计独立性的定义,如果节点 x_i 和上下文 c_j 独立(即在纯粹随机的情况下出现),那么有:
pij = pi, × p,j** ,因此 pmiij = 0。 - 相应地,在存在语义关系的情况下(即当xi和cj相关联时),pij期望会大于当两者独立时。
因此,在pij > pi, × p,j的情况下** ,pmii,j应该为正。 - 如果节点xi与上下文cj无关,则pmii,j可能为负。
(5) Given that our focus is on pairs (x_i, c_j) with semantic relations, our approach employs nonnegative PMI values.
PPMI在自然语言处理(NLP)方面得到了广泛的研究(源自该领域的研究)[4][16][28].
Apparently, the PPMI index is widely recognized for its effectiveness in measuring semantic similarities between entities.
However, based on current understanding, we were pioneers in introducing PPMI into graph-based semi-supervised learning.
Additionally, our approach incorporates a novel PPMI-based convolution to implement the principle of global consistency: graph nodes exhibiting similar contextual occurrences are likely to share identical labels.
* (对比一下PPMI和adjacency的区别)
* **Figure 1c** visualizes the normalized PPMI matrix $P$ of the Karate club network.
* Compared with the adjacency matrix of this network (shown in **Figure 1b**),
* there are at least two obvious differences:
* (1) $P$ has reduced the effect of the hub nodes, e.g., $x_0$ and $x_{33}$; and
* (2) $P$ has initiated more latent relations among different data points, which cannot be characterized by the adjacency matrix $A$.
3.3.3 PPMI-based convolution
Besides Conv_A, which is parameterized by the similarity defined via an adjacency matrix A, another feed-forward neural network Conv_P has been developed using a PPMI-based similarity measure as its foundation.
本文Model
Conv_A
based on the similarity defined by the adjacency matrix A
Conv_P
derived from the similarity defined by the PPMI matrix P
* This convolutional neural network is given by:

* where $P$ is the PPMI matrix and $D_{i,i}=\sum_j P_{i,j}$ for normalization
很明显, 基于这样的节点上下文矩阵P进行扩散传播从而实现全局的一致性
Additionally, by employing the identical neural network architecture as Conv_A, these two models can be effectively integrated into a unified framework that leverages both matrix A and matrix P for enhanced performance.
(4) 比较一下 Figure 1f, Figure 1d, Figure1e
3.4 Ensemble of Local and Global Consistencies (得想个办法,把2个部分缝合起来)
- (1) To jointly consider the local consistency and global consistency for semi-supervised learning, we must overcome the challenge of
- having very few labeled training data(有label的训练数据很少). That is, as the training data are limited, a general ensemble method (e.g., by concatenating the output of ConvA and ConvP ) cannot be utilized. (一般的集成方法不能用)
(2) 使用训练数据进行监督学习后,在此过程中我们进一步推导出一个无监督正则化集成

- (3) Figure 2 depicts the architecture of our dual graph convolutional networks approach.
- Additionally, we employ an unsupervised regularizer alongside the labeled data training of Conv_A, as defined by \mathcal{L}_0 (Conv_A) in Eq. (3), to optimize Conv_P against the posterior distributions derived from a previously trained model, specifically \mathcal{L}_0(ConvA).
3.4.1 Calculating \mathcal{L}_0(Conv_A)
- (1) Considered as predicting c distinct labels,
- The softmax activation function operates on each row of the matrix Z^A, which is generated by convolutional layer Conv_A.
- The outputs from the softmax layer are represented as \hat{Z}^A \in \mathbb{R}^{n \times c}. L0 loss for ConvA measures cross-entropy error across all labeled data points and can be computed using:

- \mathcal{Y}_L denotes the set of data indices where labels are observed during training.
* Where Y \in R^{n×c} is the ground truth (真值).
3.4.2 Calculating \mathcal{L}_{reg}(Conv_A, Conv_P)
(1) The calculating of \mathcal{L}_{reg} is given by

- When using the softmax activation function, the output from Conv_P is referred to as \hat{Z}^P\in R^{n\times c}.
- For all n data points, we propose an unsupervised loss function that reduces the discrepancies in terms of mean squared differences between \hat{Z}^P and \hat{Z}^A.
Upon analyzing the formulation of Eq. (12), we can view the unsupervised loss function as generating or predicting features that distinguish Conv_P from Conv_A.
This implies that, following the L_0-based training process (as defined by Eq. (11)), the scores derived from \hat{Z}^A \in \mathbb{R}^{n \times c}, once subjected to softmax normalization, can be understood as posterior distributions across the c labels. (These scores can be seen as posterior distributions over the c labels.)
- Through minimizing the loss function as defined in equation (12),
- under different transformations executed by Conv_A, Conv_P, and incorporating random layer-wise dropout, the final predictions emanating from each model will result in being consistent.
Referring to Figure 2, forming the core of our model lies in sharing the parameters of two convolutional layers, specifically the neural network weights W defined in equations (7) and (10). The shared parameters are implemented within \text{Conv}_A and \text{Conv}_P , ensuring efficient computation. This approach allows us to effectively integrate both aspects into a unified framework.
Despite sharing the same parameters W, varying diffusions such as A and P, along with random dropout, may lead to differing predictions between Conv_A and Conv_P, namely Z^A and Z^P.
- Nevertheless, each data point belongs exclusively to one class.
- Consequentially, based on the characterization of W in Eq. (12), both Conv_A and Conv_P yield identical predictions.
- As a result of this consideration, W has incorporated insights from both Conv_A and Conv_P.
Through explicit integration of existing knowledge (i.e., diffusion matrices A and P) during the data transformation phase, which involves
Specifically, through the use of multiple neural networks, diverse forms of prior knowledge can be embedded within the data transformation phase.
3.4.3 the final model

Algorithm 2 details the training procedure of our dual graph convolutional network architecture
- The loss function is constructed as a weighted combination of \mathcal{L}_0(Conv_A) and \mathcal{L}_{reg}(ConvA, Conv_P).
- A time-varying weight function is created to embody the concept outlined above.
- This fundamentally implies that, during the initial stages of training (i.e., when t assumes low values), the loss function's primary influence stems from the supervised component L_0.
- Once a posterior distribution over the labels has been established using Conv_A, incrementing \lambda(t) compels our model to concurrently account for The knowledge embedded within Conv_P. (同时考虑Conv_P)
(2) our implementation uses Batch Gradient Descent (BGD) ,
- 每次训练迭代中都采用完整的训练数据集进行训练。
- 然而BGD的收敛速度相对较慢,
- 该算法能够确保在凸误差面上达到全局最小值(但对于凸误差面,它保证收敛到全局最小值)
- 并且在非凸误差面上能够达到局部最小值(对于非凸误差面,它保证收敛到局部最小值)
4 Experiments
4.1 Dataset
-
3 个citation network datasets
- Citeseer
- Cora
- Pubmed
-
1个knowledge graph dataset
- NELL
-
Simplified NELL(作者自己根据NELL改的)

4.1.1 Citeseer
- Only 3.6% of the nodes are labeled for training
4.1.2 Cora
- Only 5.2% of the nodes are labeled for training
4.1.3 Pubmed
- Only 0.3% of the nodes are labeled for training
4.1.4 NELL(Never Ending Language Learning(NELL) knowledge graph)
- 在NELL中,每个关系被表示为一个三元组(e_h, r, e_t)。
- 通过将每个(e_h, r, e_t)分解为两条边(e_h, r_1)和(r_2, e_t)的形式,则可以得到一个新的图……(该作者自行进行了这种拆分)
- 仅在每个类别中分配一个标记用于训练
4.1.5 Simplified NELL
- In the simplified version of NELL, the relational information (i.e., r) was removed, and edges between entities have been directly added.
- By calculating the co-occurrences of each (eh, et) pair across all triplets, a weighted adjacency matrix A is constructed.
4.2 Methods for Comparison
4.2.1 DGCN
- (1) This is the proposed method, as described in Algorithm 2. In our Dual Graph Convolutional Networks (DGCN) implementation, both Conv_A and Conv_P have two hidden layers. (有2层隐藏层)
- Namely, there are two separate W vectors, W^{(1)} and W^{(2)}, that need training in Algorithm 2.
Table 2 provides comprehensive details on the realization of our method across various datasets, encompassing:
(1) number of neurons in the hidden layer;
(2) layer-wise dropout proportion;
(3) window dimension w as defined in Algorithm 1; and
(4) learning rate η.
4.2.2 GCN(Graph Convolutional Networks)
4.2.3 PLANETOID
- Draw inspiration from the Skipgram model [18] in NLP, PLANETOID [30] utilizes graph information through positive and negative sampling techniques.
- In the sampling process, both label information and graph structure are considered.
4.2.4 DeepWalk
- By performing random walks on a graph, different walks are generated.
- By viewing the paths as sentences, DeepWalk [21] extends language modeling techniques from word sequences to path sequences in graphs because both sequences and paths are generated according to some probability distribution.
- 通过将路径视为句子,DeepWalk[21]将语言建模技术从单词序列推广到图中的路径。
4.3 Results


4.4 Effect of Regularization Weight \lambda (t)


4.5 4.5 Effect of a Shifted PPMI Matrix P


- (1) Eq. (13) presents the calculation of a shifted PPMI matrix, first introduced in [16] for word embedding
- (2) On the basis of the derivation in [16], the value of k indicates the number of negative samplings required to calculate each entry of P(k的值表示计算P的每一项所需的负抽样个数)
- (3) In research on semi-supervised learning, we are the first to verify whether such a shift can also be applied to understand a graph.(在半监督学习的研究中,我们首先验证了这种转移是否也可以应用于理解图)
5 Conclusions(大总结,高层次的干货)
- (1) 在本研究中,我们提出了一种基于图的半监督学习框架,该框架采用了对偶图卷积网络结构。(我们在本研究中提出了一种基于图的半监督学习框架,该框架采用了对偶图卷积网络结构)
- (2) 除了通常考虑的局部一致性之外,我们的方法采用了PPMI来进行全局一致性的编码。(此外,我们的方法采用了PPMI来进行全局一致性的编码)
- (3) 我们的这项工作提供了一种有效的方法,用于整合来自原始数据不同视图中的先验知识。(我们的这项工作提供了一种有效的方法,用于整合来自原始数据不同视图中的先验知识)
a Variants of our methods(其实就是消融实验)
-
DGCN: The initial approach employed in our experiments.
-
DGCN-1: When neural network parameters do not share weights between Conv_A and Conv_P, i.e., when neural network parameters do not share weights between Conv_A and Conv_P.
-
DGCN-2: Rather than using Eq. (12), we employ concatenation to form the ensemble. The final predictions are derived from combining these latent representations through concatenation.
- Namely, before applying softmax, a dense layer is incorporated to maintain consistency with the label matrix's dimensions, specifically ensuring that the output matches Y ∈ ℝ^{n×c}.
-
DGCN-3: 在参数之间不存在共享关系,并且采用了连接操作时。
-
DGCN-4: 当未集成任何额外组件时,在系统中仅激活并利用了Conv_P这一结构。

