Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
- 1、主要贡献
- 2、算法介绍
-
-
2.1 学习局部化谱filters
-
- k阶近似与ChebNet
-
2.2 图池化
-
- 图粗化
- 快速pooling
- 整个GCN过程
-
1、主要贡献
1、谱方法的卷积公式。一种基于谱方法的CNN的形式化表述,基于GSP
2、严格的局部化的filters。局部化就是定义了一个参数K,K代表以一个节点为中心的K次跳跃(也就是距离为K)。
3、小的计算复杂度。复杂度是和K以及边数成正比。因为大部分现实生活中的图是高度稀疏的,那么就是边数远远低于点数的平方,即 边 = k*点数。k可以看做是一个点的k最近邻。这也就是说复杂度是和输入数据的大小。另外,这个方法避免计算傅里叶因子(需要计算和储存傅里叶因子)。这个方法只用储存一个拉普拉斯矩阵,这是一个稀疏矩阵,只有边数个非零值。
4、高效的pooling。通过构建一个二叉树进行pooling。
2、算法介绍
实现图上的CNN需要满足一下一点要求:
- (1)设计卷积核
- (2)相似点聚集
- (3)pooling操作
2.1 学习局部化谱filters
k阶近似与ChebNet
切比雪夫多项式及其应用
http://www.docin.com/p-524643989.html
http://bookshelf.docin.com/p-1702466525.html
https://www.ixueshu.com/document/2c40f2ae055f2240318947a18e7f9386.html
https://wenku.baidu.com/view/c9566591f56527d3240c844769eae009591ba2ce.html
https://wenku.baidu.com/view/544dab7502768e9951e73842.html
ChebNet的作者指出原始的频谱图卷积有两大缺点:
① 图卷积核是全局的且参数量大 (卷积核大小与输入信号相同,参数量与图节点数相同)
② 图卷积运算的复杂度高 (运算过程涉及高计算复杂度的特征分解)
首先,为了克服卷积核过“大”的缺点,ChebNet指出可以使用多项式展开近似计算图卷积,即对参数化的频率响应函数进行多项式近似:

其中k代表多项式的最高阶数,这一步近似将卷积核的参数数量从n个减少到了k个,使卷积核变“小”了,变得“局部”了。但是整个图卷积运算的复杂度还是O(n^2),因为输入信号要与傅里叶基U做矩阵乘法。为了进一步降低计算复杂度,作者使用迭代定义的切比雪夫多项式(ChebShev\ Polynomial)作近似并证明可以将计算复杂度降低至O(K|E|),其中K为多项式的阶数,E为图中边的数量。作者将基于切比雪夫多项式定义的卷积核称为切比雪夫卷积核,对应的卷积运算则称为切比雪夫卷积,他们的公式定义如下:

式中的\lambda_{max}表示拉普拉斯矩阵的最大特征值 ,添加’是为了表明这是近似的参数。从替换和化简后的卷积运算式可以发现,切比雪夫多项式矩阵的运算是固定,可以在预处理阶段完成,且拉普拉斯矩阵一般是稀疏的 ,可以使用稀疏张量乘法加速 ,因此整个图卷积的计算复杂度大大降低。除了定义切比雪夫图卷积外,ChebNet的作者还提出了图池化层.
推导:





幂迭代法求特征值和特征向量


学习过滤器:通过梯度下降即可完成:




应用举例


根据切比雪夫多项式进行求解


2.2 图池化
图粗化
传统图池化方法:谱聚类
本文使用方法:Graclus多级聚类算法
Weighted graph cuts without eigenvectors a multilevel approach
另外:
Algebraic multigrid techniques on graphs and the Kron reduction are two methods worth exploring in future works
快速pooling
通过上述方法得到更粗糙的图,然后对节点重排序,然后构建一个二叉树。

每级粗化我们可以得到一个粗化图(G_0,G_1,G_2,G_3,G_4),将该粗化图添加上一些辅助节点 (上图右部分上,蓝色圆 ),使粗化图中每个顶点都有2个孩子,从而构成一个二叉树,将二叉树摊平构成一维信号(上图右部分下),然后对该信号采样即可表示为一次图pooling。
- 让我们对生活在G_0上的信号x∈R^8进行最大大小为4的最大池化(或两个大小为2的池化),最细图作为输入给出。 请注意,它最初拥有 n_0 = | V_0 | = 8个顶点,任意排序。
- 对于大小为4的池,需要对大小2进行两次粗化:让Graclus给出大小为n_1 = | V_1 |的G1=5则G_2的大小为n_2 = | V_2 | = 3,最粗糙的图。
- 因此,将大小设置为n_2 = 3,n_1 = 6,n_0 = 12,并将假节点(蓝色 )添加到V_1(1个节点)和V_0(4个节点)以与正弦(橙色)配对,以便每个节点具有 正好有两个孩子 。
- 对V_2中的节点进行任意排序 ,并因此对V_1和V_0中的节点进行排序 。 此时,V_0中的顶点排列允许在x∈R^{12}上进行规则的1D合并,使得z = [max(x_0,x_1),max(x_4,x_5,x_6),max(x_8,x_9,x_10)]∈R^3 ,其中信号分量x_2,x_3,x_7,x_{11}设置为中性值。
整个GCN过程
# -*- coding: utf-8 -*-
import sys, os
sys.path.insert(0, '..')
from lib import models, graph, coarsening, utils
import tensorflow.compat.v1 as tf
import numpy as np
import time
# %matplotlib inline
flags = tf.app.flags
FLAGS = flags.FLAGS
flags.DEFINE_integer('number_edges', 8, 'Graph: minimum number of edges per vertex.')
flags.DEFINE_string('metric', 'euclidean', 'Graph: similarity measure (between features).') # 相似度测量
flags.DEFINE_bool('normalized_laplacian', True, 'Graph Laplacian: normalized.')
flags.DEFINE_integer('coarsening_levels', 4, 'Number of coarsened graphs.')
flags.DEFINE_string('dir_data', os.path.join('..', 'data', 'mnist'), 'Directory to store data.')
"""# Feature graph 图结构描述,即准备邻接矩阵A的拉普拉斯L"""
def grid_graph(m, corners=False):
z = graph.grid(m) # 该函数说白了就产生一个28*28网格的每个点的坐标
dist, idx = graph.distance_sklearn_metrics(z, k=FLAGS.number_edges, metric=FLAGS.metric) # 顶点K邻近点计算
A = graph.adjacency(dist, idx) # 构建表示图的邻接矩阵 A
# Connections are only vertical or horizontal on the grid. 网格上的连接只有水平或者垂直
# Corner vertices are connected to 2 neightbors only. 转角处的顶点只与两个邻接点连接
if corners:
import scipy.sparse
A = A.toarray()
A[A < A.max() / 1.5] = 0
A = scipy.sparse.csr_matrix(A)
print('{} edges'.format(A.nnz))
# A.nnz返回矩阵中非零结点的个数
print("{} > {} edges".format(A.nnz // 2, FLAGS.number_edges * m ** 2 // 2))
return A
t_start = time.process_time()
A = grid_graph(28, corners=False) # "邻接矩阵"
A = graph.replace_random_edges(A, 0) # "添加噪声的邻接矩阵"
# 借助图的粗化
graphs, perm = coarsening.coarsen(A, levels=FLAGS.coarsening_levels, self_connections=False) # 粗化图
# # graphs为列表形式,图粗化分为5层对图进行简化。graphs列表的每个元素存储每层图粗化的相关矩阵信息。图结点坐标形式+距离权重
# # perm为列表形式存储根据返回值可知此处perm为第一层所有节点索引,要基于此对图像结点进行重排序以适应多层的粗化图
# print('graphs的类型',type(graphs),'\ngraphs的长度',len(graphs),'\ngrapgs的值\n',graphs,'\ngraphs[0]',graphs[0])
# print('perm的类型',type(perm),'\nperm的长度',len(perm),'\nperm的值\n',perm)
L = [graph.laplacian(A, normalized=True) for A in graphs] # 对图粗化产生的每层邻接矩阵进行拉普拉斯变换
print('Execution time: {:.2f}s'.format(time.process_time() - t_start))
# graph.plot_spectrum(L)
del A
"""# Data 准备"""
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(FLAGS.dir_data, one_hot=False)
#
train_data = mnist.train.images.astype(np.float32)
# print('train_data的维度:',train_data.shape) # train_data的维度: (55000, 784)
val_data = mnist.validation.images.astype(np.float32)
# print('val_data的维度:',val_data.shape) # val_data的维度: (5000, 784)
test_data = mnist.test.images.astype(np.float32)
# print('test_data的维度:',test_data.shape) # test_data的维度: (10000, 784)
train_labels = mnist.train.labels
# print('train_labels的维度:',train_labels.shape) # train_labels的维度: (55000,)
val_labels = mnist.validation.labels
# print('val_labels的维度:',val_labels.shape) # val_labels的维度: (5000,)
test_labels = mnist.test.labels
# print('test_labels的维度:',test_labels.shape) # test_labels的维度: (10000,)
t_start = time.process_time()
train_data = coarsening.perm_data(train_data, perm) # 根据第一层粗化的结点数对图像进行重排序
val_data = coarsening.perm_data(val_data, perm)
test_data = coarsening.perm_data(test_data, perm)
print('Execution time: {:.2f}s'.format(time.process_time() - t_start))
del perm
"""# Neural networks"""
common = {}
common['dir_name'] = 'mnist/'
common['num_epochs'] = 20
common['batch_size'] = 100
common['decay_steps'] = mnist.train.num_examples / common['batch_size']
common['eval_frequency'] = 30 * common['num_epochs']
common['brelu'] = 'b1relu'
common['pool'] = 'mpool1'
C = max(mnist.train.labels) + 1 # number of classes
model_perf = utils.model_perf() # 模型对比函数
# 参数 for LeNet5-like networks.
common['regularization'] = 5e-4
common['dropout'] = 0.5
common['learning_rate'] = 0.02 # 0.03 in the paper but sgconv_sgconv_fc_softmax has difficulty to converge
common['decay_rate'] = 0.95
common['momentum'] = 0.9
common['F'] = [32, 64] # 每次卷积的输出feature大小
common['K'] = [25, 25] # 每个卷积核的多项式项数
common['p'] = [4, 4] # 每次卷积后的池化大小,池化次数与卷积次数一致
common['M'] = [512, C] # 全连接层输出
# Architecture of TF MNIST conv model (LeNet-5-like).
if True:
name = 'Chebyshev_test' # 'Chebyshev'
params = common.copy()
params['dir_name'] += name
params['filter'] = 'chebyshev5' # 使用chebyshev5所建的滤波器
model_perf.test(models.cgcnn(L, **params), name, params,
train_data, train_labels, val_data, val_labels, test_data, test_labels)
model_perf.show()
print('end')
model = models.cgcnn(L, **params)
accuracy, loss, t_step = model.fit(train_data, train_labels, val_data, val_labels)
'''
#or 使用下面的代码可以在训练后不显示评价结果,而是直接训练。
model = models.cgcnn(L, **params)
accuracy, loss, t_step = model.fit(train_data, train_labels, val_data, val_labels)
'''
graph.py的部分注释
import sklearn.metrics
import sklearn.neighbors
import matplotlib.pyplot as plt
import scipy.sparse
import scipy.sparse.linalg
import scipy.spatial.distance
import numpy as np
def grid(m, dtype=np.float32):
"""Return the embedding of a grid graph."""
# print('创建网格点传入的参数m:', m) 28
M = m**2
x = np.linspace(0, 1, m, dtype=dtype)
# print('x的维度:', x.shape) 28*1 0-1/28
y = np.linspace(0, 1, m, dtype=dtype)
# print('y的维度:', y.shape) 28*1 0-1/28
xx, yy = np.meshgrid(x, y)
# print('xx的维度:', xx.shape) 其值为0-1/28重复28次创建网格共784个点
# print('yy的维度:', yy.shape) 其值为0-1/28重复28次创建网格共784个点
z = np.empty((M, 2), dtype)
z[:, 0] = xx.reshape(M)
z[:, 1] = yy.reshape(M)
# print('z的维度:', z.shape) 784*2 将初始化的网格摊平共有784个点每个点对应一对x,y坐标
return z
def distance_scipy_spatial(z, k=4, metric='euclidean'):
"""Compute exact pairwise distances."""
d = scipy.spatial.distance.pdist(z, metric)
d = scipy.spatial.distance.squareform(d)
# k-NN graph.
#argsort()函数是将x中的元素从小到大排列,提取其对应的index(索引)
idx = np.argsort(d)[:, 1:k+1]
d.sort()
d = d[:, 1:k+1]
return d, idx
# 顶点K邻近点计算
def distance_sklearn_metrics(z, k=4, metric='euclidean'):
"""Compute exact pairwise distances."""
# print('z的维度:',z.shape) z的维度为784*2相当于传入784个点的坐标
d = sklearn.metrics.pairwise.pairwise_distances(
z, metric=metric, n_jobs=-2)
# print('d的维度:',d.shape) d的维度为784*784,相当于根据传入的坐标计算每个点相对于其他点的距离。对角线元素均为0且为对称矩阵
# k-NN graph.
idx = np.argsort(d)[:, 1:k+1] # 将距离从小到大排列从中选取距离最近的k个节点并提取索引到idx,从1开始是排除自身节点
# print('idx的数据类型:', type(idx), 'idx的长度:', len(idx)) # idx的数据类型: <class 'numpy.ndarray'> idx的长度: 784*8
d.sort()
d = d[:, 1:k+1]
# print('d的维度:', d.shape) # 784*8 每个结点K临近点的距离矩阵
# print('idx的长度', len(idx)) # 784*8 每个结点K临近点的索引列表
return d, idx
def distance_lshforest(z, k=4, metric='cosine'):
"""Return an approximation of the k-nearest cosine distances."""
assert metric is 'cosine'
lshf = sklearn.neighbors.LSHForest()
lshf.fit(z)
dist, idx = lshf.kneighbors(z, n_neighbors=k+1)
assert dist.min() < 1e-10
dist[dist < 0] = 0
return dist, idx
# TODO: other ANNs s.a. NMSLIB, EFANNA, FLANN, Annoy, sklearn neighbors, PANN
# 根据每个结点K临近点的距离矩阵dist,每个结点K临近点的索引列表创建邻接矩阵idx
# dist 784*8矩阵形式
# idx 784*8列表形式
def adjacency(dist, idx):
"""Return the adjacency matrix of a kNN graph."""
M, k = dist.shape
# print('M的值:',M) # 784
# print('k的值:',k) # 8
assert M, k == idx.shape
assert dist.min() >= 0
# Weights.
sigma2 = np.mean(dist[:, -1])**2
# print('sigma2的值',sigma2) 每个结点的距离的平方平均和
dist = np.exp(- dist**2 / sigma2)
# print('dist的维度:',dist.shape) # 对距离矩阵的值进行统一处理
# Weight matrix.
I = np.arange(0, M).repeat(k) # 0,M是784个点重复k次构建邻接矩阵 repeat方法在同一纬度上叠加 6272=784*8
# print('I的维度:', I.shape)
J = idx.reshape(M*k) # 6272=784*8
# print('J的维度:', J.shape)
V = dist.reshape(M*k) # 6272=784*8
# print('V的维度:', V.shape)
W = scipy.sparse.coo_matrix((V, (I, J)), shape=(M, M)) # 根据I,J提供的结点索引和权重矩阵V的值构建邻接矩阵
# print('W的维度:', W.shape)
# No self-connections.
W.setdiag(0) # 忽略自身连接,因此对角矩阵置为0
# Non-directed graph. # 无向图
bigger = W.T > W
W = W - W.multiply(bigger) + W.T.multiply(bigger)
assert W.nnz % 2 == 0
assert np.abs(W - W.T).mean() < 1e-10
assert type(W) is scipy.sparse.csr.csr_matrix
# print('W的维度:',W.shape) # W的维度: (784, 784)
return W
# "添加噪声的邻接矩阵"
def replace_random_edges(A, noise_level):
"""Replace randomly chosen edges by random edges."""
M, M = A.shape # 784,784
n = int(noise_level * A.nnz // 2)
# print(n) noise_level传入参数为0
indices = np.random.permutation(A.nnz//2)[:n] # np.random.permutation():随机排列序列。 noise_level传入参数为0 indices=[]
rows = np.random.randint(0, M, n) # noise_level传入参数为0 rows=[]
cols = np.random.randint(0, M, n) # noise_level传入参数为0 cols=[]
vals = np.random.uniform(0, 1, n) # noise_level传入参数为0 vals=[]
assert len(indices) == len(rows) == len(cols) == len(vals)
A_coo = scipy.sparse.triu(A, format='coo') # scipy.sparse.triu(A, k=0, format=None) 以稀疏格式返回矩阵的上三角部分。foo代表稀疏矩阵格式
# print('A_coo的维度',A_coo.shape) # 784*784上三角矩阵784,783...1
# print(A_coo) # 存储格式为矩阵结点索引+距离权重
assert A_coo.nnz == A.nnz // 2
assert A_coo.nnz >= n
A = A.tolil() # 对稀疏矩阵进程操作前的必要转换,提高计算效率
for idx, row, col, val in zip(indices, rows, cols, vals):
old_row = A_coo.row[idx]
old_col = A_coo.col[idx]
A[old_row, old_col] = 0
A[old_col, old_row] = 0
A[row, col] = 1
A[col, row] = 1
A.setdiag(0) # 将对角矩阵置为0,不考虑自身连接
A = A.tocsr() # 转换数据格式
A.eliminate_zeros()
return A
def laplacian(W, normalized=True):
"""Return the Laplacian of the weigth matrix."""
# Degree matrix.
d = W.sum(axis=0) # 以行为推进计算每个点的度
# print('d的维度:',d.shape) # 得出每个点及其对应的度
# Laplacian matrix.
if not normalized:
D = scipy.sparse.diags(d.A.squeeze(), 0)
L = D - W
else:
d += np.spacing(np.array(0, W.dtype))
d = 1 / np.sqrt(d)
# print(d)
D = scipy.sparse.diags(d.A.squeeze(), 0) # 根据每个点的度构造度对角矩阵
# print('D的维度:', D.shape)
# print('D的值:', D)
I = scipy.sparse.identity(d.size, dtype=W.dtype)
L = I - D * W * D
# assert np.abs(L - L.T).mean() < 1e-9
assert type(L) is scipy.sparse.csr.csr_matrix
return L
def lmax(L, normalized=True):
"""Upper-bound on the spectrum."""
if normalized:
return 2
else:
return scipy.sparse.linalg.eigsh(
L, k=1, which='LM', return_eigenvectors=False)[0]
def fourier(L, algo='eigh', k=1):
"""Return the Fourier basis, i.e. the EVD of the Laplacian."""
def sort(lamb, U):
idx = lamb.argsort()
return lamb[idx], U[:, idx]
if algo is 'eig':
lamb, U = np.linalg.eig(L.toarray())
lamb, U = sort(lamb, U)
elif algo is 'eigh':
lamb, U = np.linalg.eigh(L.toarray())
elif algo is 'eigs':
lamb, U = scipy.sparse.linalg.eigs(L, k=k, which='SM')
lamb, U = sort(lamb, U)
elif algo is 'eigsh':
lamb, U = scipy.sparse.linalg.eigsh(L, k=k, which='SM')
return lamb, U
def plot_spectrum(L, algo='eig'):
"""Plot the spectrum of a list of multi-scale Laplacians L."""
# Algo is eig to be sure to get all eigenvalues.
plt.figure(figsize=(17, 5))
for i, lap in enumerate(L):
lamb, U = fourier(lap, algo)
step = 2**i
x = range(step//2, L[0].shape[0], step)
lb = 'L_{} spectrum in [{:1.2e}, {:1.2e}]'.format(i, lamb[0], lamb[-1])
plt.plot(x, lamb, '.', label=lb)
plt.legend(loc='best')
plt.xlim(0, L[0].shape[0])
plt.ylim(ymin=0)
def lanczos(L, X, K):
"""
Given the graph Laplacian and a data matrix, return a data matrix which can
be multiplied by the filter coefficients to filter X using the Lanczos
polynomial approximation.
"""
M, N = X.shape
assert L.dtype == X.dtype
def basis(L, X, K):
"""
Lanczos algorithm which computes the orthogonal matrix V and the
tri-diagonal matrix H.
"""
a = np.empty((K, N), L.dtype)
b = np.zeros((K, N), L.dtype)
V = np.empty((K, M, N), L.dtype)
V[0, ...] = X / np.linalg.norm(X, axis=0)
for k in range(K-1):
W = L.dot(V[k, ...])
a[k, :] = np.sum(W * V[k, ...], axis=0)
W = W - a[k, :] * V[k, ...] - (
b[k, :] * V[k-1, ...] if k > 0 else 0)
b[k+1, :] = np.linalg.norm(W, axis=0)
V[k+1, ...] = W / b[k+1, :]
a[K-1, :] = np.sum(L.dot(V[K-1, ...]) * V[K-1, ...], axis=0)
return V, a, b
def diag_H(a, b, K):
"""Diagonalize the tri-diagonal H matrix."""
H = np.zeros((K*K, N), a.dtype)
H[:K**2:K+1, :] = a
H[1:(K-1)*K:K+1, :] = b[1:, :]
H.shape = (K, K, N)
Q = np.linalg.eigh(H.T, UPLO='L')[1]
Q = np.swapaxes(Q, 1, 2).T
return Q
V, a, b = basis(L, X, K)
Q = diag_H(a, b, K)
Xt = np.empty((K, M, N), L.dtype)
for n in range(N):
Xt[..., n] = Q[..., n].T.dot(V[..., n])
Xt *= Q[0, :, np.newaxis, :]
Xt *= np.linalg.norm(X, axis=0)
return Xt # Q[0, ...]
def rescale_L(L, lmax=2):
"""Rescale the Laplacian eigenvalues in [-1,1]."""
M, M = L.shape
I = scipy.sparse.identity(M, format='csr', dtype=L.dtype)
L /= lmax / 2
L -= I
return L
def chebyshev(L, X, K):
"""Return T_k X where T_k are the Chebyshev polynomials of order up to K.
Complexity is O(KMN)."""
M, N = X.shape
assert L.dtype == X.dtype
# L = rescale_L(L, lmax)
# Xt = T @ X: MxM @ MxN.
Xt = np.empty((K, M, N), L.dtype)
# Xt_0 = T_0 X = I X = X.
Xt[0, ...] = X
# Xt_1 = T_1 X = L X.
if K > 1:
Xt[1, ...] = L.dot(X)
# Xt_k = 2 L Xt_k-1 - Xt_k-2.
for k in range(2, K):
Xt[k, ...] = 2 * L.dot(Xt[k-1, ...]) - Xt[k-2, ...]
return Xt
import numpy as np
import scipy.sparse
def coarsen(A, levels, self_connections=False):
"""
Coarsen a graph, represented by its adjacency matrix A, at multiple
levels.
"""
#粗化计算
graphs, parents = metis(A, levels)
#细化计算
perms = compute_perm(parents)
for i, A in enumerate(graphs):
M, M = A.shape
if not self_connections:
A = A.tocoo()
A.setdiag(0)
if i < levels:
A = perm_adjacency(A, perms[i])
A = A.tocsr()
A.eliminate_zeros()
graphs[i] = A
Mnew, Mnew = A.shape
print('Layer {0}: M_{0} = |V| = {1} nodes ({2} added),'
'|E| = {3} edges'.format(i, Mnew, Mnew-M, A.nnz//2))
return graphs, perms[0] if levels > 0 else None
def metis(W, levels, rid=None):
"""
Coarsen a graph multiple times using the METIS algorithm.
INPUT
W: symmetric sparse weight (adjacency) matrix
levels: the number of coarsened graphs
OUTPUT
graph[0]: original graph of size N_1
graph[2]: coarser graph of size N_2 < N_1
graph[levels]: coarsest graph of Size N_levels < ... < N_2 < N_1
parents[i] is a vector of size N_i with entries ranging from 1 to N_{i+1}
which indicate the parents in the coarser graph[i+1]
nd_sz{i} is a vector of size N_i that contains the size of the supernode in the graph{i}
NOTE
if "graph" is a list of length k, then "parents" will be a list of length k-1
"""
N, N = W.shape
if rid is None:
rid = np.random.permutation(range(N))
parents = []
#Degree matrix
degree = W.sum(axis=0) - W.diagonal()
graphs = []
graphs.append(W)
#supernode_size = np.ones(N)
#nd_sz = [supernode_size]
#count = 0
#while N > maxsize:
for _ in range(levels):
#count += 1
# CHOOSE THE WEIGHTS FOR THE PAIRING
# weights = ones(N,1) # metis weights
weights = degree # graclus weights
# weights = supernode_size # other possibility
#squeeze 函数:从数组的形状中删除单维度条目,即把shape中为1的维度去掉
weights = np.array(weights).squeeze()
# PAIR THE VERTICES AND CONSTRUCT THE ROOT VECTOR
idx_row, idx_col, val = scipy.sparse.find(W)
perm = np.argsort(idx_row)
rr = idx_row[perm]
cc = idx_col[perm]
vv = val[perm]
cluster_id = metis_one_level(rr,cc,vv,rid,weights) # rr is ordered
parents.append(cluster_id)
# TO DO
# COMPUTE THE SIZE OF THE SUPERNODES AND THEIR DEGREE
#supernode_size = full( sparse(cluster_id, ones(N,1) , supernode_size ) )
#print(cluster_id)
#print(supernode_size)
#nd_sz{count+1}=supernode_size;
# COMPUTE THE EDGES WEIGHTS FOR THE NEW GRAPH
nrr = cluster_id[rr]
ncc = cluster_id[cc]
nvv = vv
Nnew = cluster_id.max() + 1
# CSR is more appropriate: row,val pairs appear multiple times
W = scipy.sparse.csr_matrix((nvv,(nrr,ncc)), shape=(Nnew,Nnew))
W.eliminate_zeros()
# Add new graph to the list of all coarsened graphs
graphs.append(W)
N, N = W.shape
# COMPUTE THE DEGREE (OMIT OR NOT SELF LOOPS)
degree = W.sum(axis=0)
#degree = W.sum(axis=0) - W.diagonal()
# CHOOSE THE ORDER IN WHICH VERTICES WILL BE VISTED AT THE NEXT PASS
#[~, rid]=sort(ss); # arthur strategy
#[~, rid]=sort(supernode_size); # thomas strategy
#rid=randperm(N); # metis/graclus strategy
ss = np.array(W.sum(axis=0)).squeeze()
rid = np.argsort(ss)
return graphs, parents
# Coarsen a graph given by rr,cc,vv. rr is assumed to be ordered
def metis_one_level(rr,cc,vv,rid,weights):
nnz = rr.shape[0]
N = rr[nnz-1] + 1
marked = np.zeros(N, np.bool)
rowstart = np.zeros(N, np.int32)
rowlength = np.zeros(N, np.int32)
cluster_id = np.zeros(N, np.int32)
oldval = rr[0]
count = 0
clustercount = 0
#计算每个顶点的连接长度
for ii in range(nnz):
rowlength[count] = rowlength[count] + 1
if rr[ii] > oldval:
oldval = rr[ii]
rowstart[count+1] = ii
count = count + 1
#rid为随机数,将顶点顺序打乱
for ii in range(N):
tid = rid[ii]
#根据craclue算法,marked的作用是将每个点标记
#找到x邻域内的未标记点y进行计算
if not marked[tid]:
wmax = 0.0
rs = rowstart[tid]
marked[tid] = True
bestneighbor = -1
#rowlength[tid]表示每个点的邻域内的点
#并根据e(x,y)*(a.0/weight(x)+1.0/weight(y))最大化
for jj in range(rowlength[tid]):
nid = cc[rs+jj]
if marked[nid]:
tval = 0.0
else:
tval = vv[rs+jj] * (1.0/weights[tid] + 1.0/weights[nid])
if tval > wmax:
wmax = tval
bestneighbor = nid
cluster_id[tid] = clustercount
if bestneighbor > -1:
cluster_id[bestneighbor] = clustercount
marked[bestneighbor] = True
clustercount += 1
return cluster_id
def compute_perm(parents):
"""
Return a list of indices to reorder the adjacency and data matrices so
that the union of two neighbors from layer to layer forms a binary tree.
"""
# Order of last layer is random (chosen by the clustering algorithm).
indices = []
if len(parents) > 0:
M_last = max(parents[-1]) + 1
indices.append(list(range(M_last)))
# 取从后向前(相反)的元素
for parent in parents[::-1]:
#print('parent: {}'.format(parent))
# Fake nodes go after real ones.
pool_singeltons = len(parent)
indices_layer = []
for i in indices[-1]:
indices_node = list(np.where(parent == i)[0])
assert 0 <= len(indices_node) <= 2
#print('indices_node: {}'.format(indices_node))
# Add a node to go with a singelton.
if len(indices_node) is 1:
indices_node.append(pool_singeltons)
pool_singeltons += 1
#print('new singelton: {}'.format(indices_node))
# Add two nodes as children of a singelton in the parent.
elif len(indices_node) is 0:
indices_node.append(pool_singeltons+0)
indices_node.append(pool_singeltons+1)
pool_singeltons += 2
#print('singelton childrens: {}'.format(indices_node))
indices_layer.extend(indices_node)
indices.append(indices_layer)
# Sanity checks.
for i,indices_layer in enumerate(indices):
M = M_last*2**i
# Reduction by 2 at each layer (binary tree).
assert len(indices[0] == M)
# The new ordering does not omit an indice.
assert sorted(indices_layer) == list(range(M))
return indices[::-1]
assert (compute_perm([np.array([4,1,1,2,2,3,0,0,3]),np.array([2,1,0,1,0])])
== [[3,4,0,9,1,2,5,8,6,7,10,11],[2,4,1,3,0,5],[0,1,2]])
def perm_data(x, indices):
"""
Permute data matrix, i.e. exchange node ids,
so that binary unions form the clustering tree.
"""
if indices is None:
return x
N, M = x.shape
Mnew = len(indices)
assert Mnew >= M
xnew = np.empty((N, Mnew))
for i,j in enumerate(indices):
# Existing vertex, i.e. real data.
if j < M:
xnew[:,i] = x[:,j]
# Fake vertex because of singeltons.
# They will stay 0 so that max pooling chooses the singelton.
# Or -infty ?
else:
xnew[:,i] = np.zeros(N)
return xnew
def perm_adjacency(A, indices):
"""
Permute adjacency matrix, i.e. exchange node ids,
so that binary unions form the clustering tree.
"""
if indices is None:
return A
M, M = A.shape
Mnew = len(indices)
assert Mnew >= M
A = A.tocoo()
# Add Mnew - M isolated vertices.
if Mnew > M:
rows = scipy.sparse.coo_matrix((Mnew-M, M), dtype=np.float32)
cols = scipy.sparse.coo_matrix((Mnew, Mnew-M), dtype=np.float32)
A = scipy.sparse.vstack([A, rows])
A = scipy.sparse.hstack([A, cols])
# Permute the rows and the columns.
perm = np.argsort(indices)
A.row = np.array(perm)[A.row]
A.col = np.array(perm)[A.col]
# assert np.abs(A - A.T).mean() < 1e-9
assert type(A) is scipy.sparse.coo.coo_matrix
return A

models.py中的 class cgcnn(base_model)
def build_graph(self, M_0):
"""Build the computational graph of the model."""
self.graph = tf.Graph()
with self.graph.as_default():
# Inputs.
with tf.name_scope('inputs'):
self.ph_data = tf.placeholder(tf.float32, (self.batch_size, M_0), 'data')
self.ph_labels = tf.placeholder(tf.int32, (self.batch_size), 'labels')
self.ph_dropout = tf.placeholder(tf.float32, (), 'dropout')
# Model.
op_logits = self.inference(self.ph_data, self.ph_dropout) #!!!!!!#
self.op_loss, self.op_loss_average = self.loss(op_logits, self.ph_labels, self.regularization)
self.op_train = self.training(self.op_loss, self.learning_rate,
self.decay_steps, self.decay_rate, self.momentum)
self.op_prediction = self.prediction(op_logits)
# Initialize variables, i.e. weights and biases.
self.op_init = tf.global_variables_initializer()
# Summaries for TensorBoard and Save for model parameters.
self.op_summary = tf.summary.merge_all()
self.op_saver = tf.train.Saver(max_to_keep=5)
self.graph.finalize()
以cgcnn中的_inference为例分析(models.py中的其他是论文中提到的其他模型或配置实现),cgcnn为作者提出的基准模型。
def _inference(self, x, dropout):
# Graph convolutional layers.
x = tf.expand_dims(x, 2) # N x M x F=1
for i in range(len(self.p)): #依据池化次数,设置卷积(池化and卷积一对一)
with tf.variable_scope('conv{}'.format(i+1)): #此scope中的为一次图卷积
with tf.name_scope('filter'): #卷积中的滤波器
x = self.filter(x, self.L[i], self.F[i], self.K[i])
with tf.name_scope('bias_relu'): #卷积中的激活函数
x = self.brelu(x)
with tf.name_scope('pooling'): #卷积后的池化
x = self.pool(x, self.p[i])
# Fully connected hidden layers.
N, M, F = x.get_shape()
x = tf.reshape(x, [int(N), int(M*F)]) # N x M
for i,M in enumerate(self.M[:-1]):
with tf.variable_scope('fc{}'.format(i+1)):
x = self.fc(x, M)
x = tf.nn.dropout(x, dropout)
# Logits linear layer, i.e. softmax without normalization.
with tf.variable_scope('logits'):
x = self.fc(x, self.M[-1], relu=False)
return x
分解上述代码scope(‘filter’),发现他是借助getattr()重新与指定函数绑定,如和chebyshev2()绑定,这里我们以chebyshev2()(PS 该函数即为filter,且为作者提出的经典类型,功能应该和chebyshv5()相同,只不过通过numpy实现,而chebyshev5借助TF实现,)为例展开解释filter如何编写的(numpy比TF实现看起来更好理解)。
def chebyshev2(self, x, L, Fout, K):
"""
Filtering with Chebyshev interpolation
Implementation: numpy.
Data: x of size N x M x F
N: number of signals
M: number of vertices
F: number of features per signal per vertex
"""
N, M, Fin = x.get_shape()
N, M, Fin = int(N), int(M), int(Fin)
# Rescale Laplacian. Copy to not modify the shared L.
L = scipy.sparse.csr_matrix(L)
L = graph.rescale_L(L, lmax=2)
# Transform to Chebyshev basis
x = tf.transpose(x, perm=[1, 2, 0]) # M x Fin x N
x = tf.reshape(x, [M, Fin*N]) # M x Fin*N #将X所有点所有特征放到一个维度上,得X
def chebyshev(x): #在X上用用chebyshev,返回T_k X(py_func将X转到numpy array运行)
return graph.chebyshev(L, x, K)
x = tf.py_func(chebyshev, [x], [tf.float32])[0] # K x M x Fin*N
x = tf.reshape(x, [K, M, Fin, N]) # K x M x Fin x N # 维度转换
x = tf.transpose(x, perm=[3,1,2,0]) # N x M x Fin x K
x = tf.reshape(x, [N*M, Fin*K]) # N*M x Fin*K
# Filter: Fin*Fout filters of order K, i.e. one filterbank per feature.
W = self._weight_variable([Fin*K, Fout], regularization=False) #权重乘法
x = tf.matmul(x, W) # N*M x Fout
return tf.reshape(x, [N, M, Fout]) # N x M x Fout
从上面可以看到,先将x转换到稀疏矩阵的形式,然后维度变形,之后在其上使用graph.chebyshev()应用chebyshev多项式的迭代计算函数,获得T_k,即论文中使用chebyshev来计算T_k-–下图左, 来近似diag(λ^k)—下图右),同时我们注意到下面公式里含有求和(关于k),而上面代码中并没有,而是通过增加权重参数(w本为【Fin*Fout】,现为【Fin*k,Fout】)将其从求和变为x*w(shape(w) = [Fin*k,Fout])的形式。

def chebyshev(L, X, K):
"""Return T_k X where T_k are the Chebyshev polynomials of order up to K.
Complexity is O(KMN)."""
'''来自于graph.py
返回T_k X,其中T_k是最多为K阶的Chebyshev多项式。'''
M, N = X.shape
assert L.dtype == X.dtype
# L = rescale_L(L, lmax)
# Xt = T @ X: MxM @ MxN.
Xt = np.empty((K, M, N), L.dtype)
# Xt_0 = T_0 X = I X = X.
Xt[0, ...] = X
# Xt_1 = T_1 X = L X.
if K > 1:
Xt[1, ...] = L.dot(X)
# Xt_k = 2 L Xt_k-1 - Xt_k-2.
for k in range(2, K):
Xt[k, ...] = 2 * L.dot(Xt[k-1, ...]) - Xt[k-2, ...]
return Xt
上面的chebyshev即借助chebyshev的多项式递推性质而写。获得T_k后经过适当变形处理,即可和需要学习的参数θ相乘,然后返回结果,至此一次filter()结束
Pool并不是作者提出,其重新组织为1D信号时需要采取的粗化与重排
而是将出粗化重排结果转换为拉普拉斯矩阵,用在卷积过程
def mpool1(self, x, p):
"""Max pooling of size p. Should be a power of 2."""
if p > 1:
x = tf.expand_dims(x, 3) # N x M x F x 1
x = tf.nn.max_pool(x, ksize=[1,p,1,1], strides=[1,p,1,1], padding='SAME')
#tf.maximum
return tf.squeeze(x, [3]) # N x M/p x F
else:
return x
def apool1(self, x, p):
"""Average pooling of size p. Should be a power of 2."""
if p > 1:
x = tf.expand_dims(x, 3) # N x M x F x 1
x = tf.nn.avg_pool(x, ksize=[1,p,1,1], strides=[1,p,1,1], padding='SAME')
return tf.squeeze(x, [3]) # N x M/p x F
else:
return x


