Advertisement

[论文精读]Multi-view graph convolutional networks with attention mechanism

阅读量:

A multi-graph based network architecture for learning framework of multi-modal data analysis incorporating attention mechanisms - ScienceDirect

Multiple-View GCNs incorporating Attention Mechanisms ( MAGCN )

英语文本是手工输入完成!这是基于原文进行摘要与重述过程的文章。在实际操作中可能会出现难以避免的拼写错误与语法问题,请随时通过评论提出宝贵意见。需要注意的是本文内容仅供参考,请根据自身需求谨慎阅读。

目录

1. 省流版

1.1. 心得

1.2. 论文总结图

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Related work

2.4. Preliminaries

2.5. Multi-view graph convolutional network with attention mechanism

2.6. Theoretical analysis

2.7. Experiments

2.8. Conclusion

3. 知识补充

3.1. Discrete convolution

3.2. Information theory

4. Reference List


1. 省流版

1.1. 心得

(1)属于节点分类任务,在与通常应用于脑的图分类不同之处在于… 由于节省了大量时间和资源投入在常规应用于脑的图分类上而我认为这一步骤可能是不必要的。

(2)2022年真是GNN发展迅速的一年...

(3)实际上难以应用于脑网络领域...如果想要在脑网络中拥有多个视角的话,仅仅依靠功能连接矩阵可能不够,还需要结合结构连接矩阵才能获得更为全面的数据.然而,在医学研究中似乎有这样的ASD、AD等病症与大脑结构并无直接关联,因此推测这些疾病并不会影响大脑结构.

(4)多种视角听起来挺有意思的建议。不过,在生物领域是否算得上新奇呢?脑网络这方面我还真没遇到过。只是总觉得这样的方法缺少大量关键的数据支持吧?原始信息这么多吗?

1.2. 论文总结图

2. 论文逐段精读

2.1. Abstract

Most-based GCN methods typically depend on fixed adjacency matrices, which represent the single-layer topological structure of an underlying graph.(基础图中的单层拓扑关系通过固定不变的邻接矩阵进行表征...虽然这个术语听起来有点复杂……但我个人对此并不太清楚它的具体含义)

②⭐However, it limits and goes wrong when there are collection issues

They present a MAGCN model that integrates topological structures of multi-view graphs with an approach for aggregating features based on attention mechanisms.

MAGCN manages the node-level classification task...(完成之后的一瞬间让我沉默下来了吗?反正都已经看了不少了,就这样吧)

error-prone 容易出错;易错;易于出错的;错误倾向

2.2. Introduction

Considering node classification tasks, there could be discrepancies in their structural configurations as to whether they are implemented in the training versus target domains.

They seek to establish multi-view graphs, including multi-adjacency matrices, in order to create an approximated graph topology.

③Briefly introduce their model

①Spatial based methods are known as diffusion-based neural networks and include DCNN, GraphSAGE, MoNet, MPNN, and graph isomorphism-based network (GIN).

②Spectral based approaches such as GCN, ChebNet

This model that incorporates topology and attention mechanisms ignores the multi-view graph.

通过专业的技术手段实现系统性的数据处理

2.4. Preliminaries

①An undirected graph

G=eft  V,E,A ight

, where

V

denotes the node set and the number of node is

N

,

E

denotes the set of edges,

A

represents the adjacency matrix

②Then a multi-view graph can be

G=V,,,dots,

, where

n

The number of views is represented by. Additionally, the representation can be expressed as.

G=V,A_1,A_2,dots,A_n

③The graph Fourier transform (GFT) is

at{x}=U^athrm{T}x

, where

xnathbb{R}^N

,

U

denotes the eigenvector matrix of

L=I-D{-\frac{1}{2}}AD{-rac{1}{2}}

, a the normalized graph Laplacian matrix.

D

denotes the degree matrix

④The graph convolution operator

tar_G

in Fourier domain with:

xtar_{G}y=Udot

⑤The graph convolution with convolution operator

tar

:

g_hetatar x=g_hetax=g_hetax=Ug_hetaU^athrm{T}x

and it can be approximated by:

g_hetatar x=um_{k=0}^{K-1}heta_kT_kx

to reduce the computation complexity, where

ilde{L}=2L/ambda _{max}-I

represents scaled Laplacian matrix,

ambda _{max}

represents the largest eigenvalue,

T_{k}nathbb{R}^{Nimes N}

denotes the Chebyshev polynomial of order

k

,

hetanathbb{R}^{K}

denotes the Chebyshev coefficient

⑥The filter

F

:

ilde{X}=ilde{D}{-\frac{1}{2}}\tilde{A}\tilde{D}{-rac{1}{2}}XW=at{A}XW

where

at{A}=ilde{D}{-\frac{1}{2}}\tilde{A}\tilde{D}{-rac{1}{2}}

;

ilde{A}=A+I

means it contains the self-loops;

ilde{D}{ii}=um{j}ilde{A}_{ij}

;

Wnathbb{R}^{Mimes F}

is the trainable weight matrix of

F

2.5. Multi-view graph convolutional network with attention mechanism

①Define the graph

G=V,X,A

,where

V

denotes

N

nodes with feature

xnathbb{R}^{M}

respectively.Combining all the features, there are feature matrix

Xnathbb{R}^{Nimes M}

②The traditional GCN can be

Y=f, Ynathbb{R}^{Nimes F}

, where

feft

denotes the designed activation function

③They change the sigle view graph

G

to

G^{*}=V,X,A_{1},A_{2},dots,A_{n}

by information theory

Ieqarepsilon_{athrm{info}}

This comprehensive system architecture is composed of two multi-GCN modules and a single multi-view attention module.

(为什么多视图图有5个节点捏?N个拓扑,也有特征矩阵。

作者意思是最开始是M,然后经过unfold变成F,即

athcal{X}=at{X}{1},at{X}{2},dots,at{X}_{n}nathbb{R}^{nimes5imes F}

GAP之后变成

ar{X}nathbb{R}^{5imes F}

merge with softmax之后变成

X{*}\in\mathbb{R}{5imes C}

conundrum n.难题;谜语;复杂难解的问题;令人迷惑的难题

(1)Multi-GCN(unfold)block

①Input:

G^*=V,X,A_1,A_2,dots,A_n

②Output:

athcal{X}=at{X}{1},at{X}{2},dots,at{X}_{n}nathbb{R}^{nimes Nimes F}

with approach

at{X}{i}=fathrm{GCN}=athrm{ReLU},at{X}_{i}nathbb{R}^{Nimes F}

, where

n

denotes the number of views

(2)Attention block

①该种注意力机制通过融合身份映射与注意力分布学习阶段来实现特征提取。其中身份映射阶段主要负责将输入特征与自身空间进行交互。

athcal{X}

to

F_{scale}

by

ar{X}=F_{scale}=um_{i=1}^{n}c_{i}at{X}_{i}

The attention allocation learning module encompasses both the global average pooling (GAP) mechanism and the multi-layer perceptron (MLP).

②The schematic of GAP:

③The traditional GAP is:

athrm{f}{i}=f{athrm{GAP}}=rac{1}{himes w}um_{j=1}{h}\sum_{k=1}{w}athrm{F}_{i,jk}

where

i

denotes the layer and

athbf{F}_{i}nathbb{R}^{himes w}

④In order to change the weight of each

F

, the authors propose a graph GAP:

at{x}i=rac{1}{N}um{j=1}{N}\frac{1}{|\mathcal{N}_{i,j}|}\sum_{k=1}{|athcal{N}{i,j}|}{jk}at{X}_{i,j,k}

where

athcal{N}_{i,j}

represents the neighbors of the

j

-th node on the

i

-th view;

um_{k=1}^{|athcal{N}{i,j}|}eft{jk}at{X}_{i,j,k}

denotes the graph aggregation process and reflects the enhancement effect of the model

⑤Then, learn the weights

C=c_{1},c_{2},dots,c_{n}nathbb{R}^{n}

through MLP

⑥With

C

, mapping the

athcal{X}

with

ar{X}=F_{scale}=um_{i=1}^{n}c_{i}at{X}_{i}

(3)Multi-GCN(merge)

①Classify the

ar{X}

with

X*=\sum\limits_{i=1}nf_ext{GCN}=ext{softmax}, Xnathbb{R}^{Nimes C}

, where

C

is the number of classes

According to the semi-supervised method, they utilize cross-entropy error as the loss function:

L=-um_{kn V_L}um_{j=1}^CY_{kj}n X_{kj}^*

where

V_L

is the set of labeled nodes,

Ynathbb{R}^{|V_{L}|imes C}

is the label indicator matrix

2.6. Theoretical analysis

通过严格的数学论证分析了原因所在,并得出了结论:他们的方法确实具有显著的优势和合理性。然而由于自身的数学水平有限无法完全理解其中的深奥原理因此决定暂时不深入研究

2.7. Experiments

Experiments employ attack scenarios with multiple degrees of topology perturbations in proving the robustness of MAGCN.

②The datasets:

③The output dimension

F

of multi-GCN (unfold): 16

④Layers of MLP in attention: 3

Neurons的数量在第一层、第二层以及最后一层分别为6、3以及与视角数量对应的数量。

⑥Optimizer: Adam

⑦Learning rate: 0.01

⑧Weight decay: 0.0005

⑨Weight initialization: Glorot uniform initializer

⑩Dropout rate: 0.5

⑪⭐在所有三个数据集中显示为3次查看次数,在拓扑结构以及节点间的特征相似性与文本间的相似性上进行比较(当数值超过特定阈值时会建立连接)。

⑫Comparisons with 10 runs:

⑬Choices in ablation study:

GCN+View 1: GCN with view 1 (the given adjacency matrix)
GCN+View 2: GCN with view 2 (the similarity-based graph)
GCN+View 3: GCN with view 3 (the b-matching graph)
MLP+GCN+View 1,2,3: GCN with three views via a standard MLP
MAGCN+View 1,2,3: Our MAGCN with three views

and the comparison:

⑭Visualize the result by t-SNE (the left is GCN and the right one is MAGCN):

Robustness analysis under random topology attack (RTA) involves randomly removing edges at a ratio from 0.1 to 1

⑯Robustness analysis with low label rates (LLR): label rate sets are {0.025, 0.02, 0.015, 0.01, 0.005}:

⑰The other MAGCN with cosine similarity:

s_{ij}=os

. There are ablation choices as well:

GCN+View 1: GCN with view 1, i.e., the given adjacency matrix
GCN+View 2: GCN with view 2, i.e., the similarity-based graph

GCN+View 2^⁎: GCN with view 2^⁎, i.e., the weighted trainable similarity-based graph based on cosine similarity |

MAGCN+View 1,2: MAGCN with the view 1 and view 2

MAGCN+View 1,2^⁎: MAGCN with the view 1 and view 2^⁎ |

and the comparison:

2.8. Conclusion

The MAGCN can effectively extract node features within various neighbor layers.

3. 知识补充

3.1. Discrete convolution

参考学习:博客 - 连续域上的卷积运算与离散时间域上的卷积操作及其积分计算

3.2. Information theory

了解信息论的基础知识可以从阮一峰老师的博客学习。

4. Reference List

Yao K. et al. (2022) 'Multi-view graph convolutional networks with attention mechanism', Artificial Intelligence , 307. doi: Redirecting

全部评论 (0)

还没有任何评论哟~