CS224W: Machine Learning with Graphs - 03 Node Embeddings

阅读量：

Node Embeddings

1. Graph Represnetation Learning

Graph represnetation learning alleviates the need to do feature engineering every single time (automatically learn the features)
Goal : efficient task-independent feature learning for machine learning with graphs
Why embedding?

Similarity of embeddings between nodes indicates their similarity in the netwrok
Encode network information
Potantially used for many downstream predictions (node classification, link prediction, graph prediction, anomalous node detection, clustering…)

2. Node Embeddings: Encoder and Decoder

Goal : encode nodes so that similarity in the embedding space approximates similarity in the graph
a) Encoder ENC maps from nodes to embeddings (a low-dimensional vector)
b) Define a node similarity function (i.e., a measure of similarity in the original network)
c) Decoder DEC maps from embeddings to the similarity score
d) Optimize the parameters of the encoder so that similarity $(u, v)\approx z_v^Tz_u$

1). “Shallow” Encoding

Simplest encoding approach: encoder is just an embedding-lookup so each node is assigned a unique embedding vector
$\text{ENV}(v)=z_v=Z \cdot v$
where $Z$ is matrix and each column is a node embedding and $v$ is an indicator vector with all zeroes excepy a one in column indicating node $v$
Methods : DeepWalk, node2vec

3. Random Walk Approaches for Node Embeddings

Vector $z_u$ is the embedding of node $u$
Probability $P(v|z_u)$ is the (predicted) probability of visiting node $v$ on random walks starting from node $u$
Random walk : given a graph and a starting point, we select one of its neighbors at random and move to this neighbor; then we select a neighbor of this point at random and move to it, etc. The (random) sequence of points visited this way is a random walk on the graph.

1). Random-walk Embeddings

$z_u^Tz_v \approx \text{probability that \textit{u} and \textit{v} co-occur on a random walk over the graph}$

Estimate probability $P_R(v|u)$ of visiting node $v$ on a random walk starting from node $u$ using the random walk strategy $R$
Optimize embeddings to encode these random walk statistics

Why random walks?

Expressivity : flexible stochastic definition of node similarity that incorporates both local and higher-order neighborhood information (If a random walk starting from node $u$ visits $v$ with high probability, $u$ and $v$ are similar)
Efficiency : do not need to consider all node pairs when training; only need to consider pairs that co-occur on random walks

2). Unsupervised Feature Learning

Intuition : find embedding of nodes in $d$ -dimensional space that preserves similarity
Idea : learn node embedding such that nearby nodes are close together in the network
$N_R(u)$ : neighborhood of $u$ obtained by the strategy $R$
Goal: learn a mapping $f: u \to \mathbb{R}_d$ : $f(u) = z_u$
Log-likelihood objective:
$\underset{f}{max}\sum_{u\in V} log P(N_R(u)|z_u)$
Given node $u$ , we want to learn feature represnetations that are predictive of the nodes in its random walk neighborhood $N_R(u)$

3). Random-walk Optimization

a) Run short fixed-length random walks starting from each node $u$ in the graph using the random walk strategy $R$
b) For each node $u$ , collect $N_R(u)$ , the multiset of nodes visited on random walks starting from $u$
c) Optimize embeddings
$\underset{f}{max}\sum_{u\in V} \log P(N_R(u)|z_u)$
Parameterize $P(v|z_u)$ using softmax
$P(v|z_u)=\frac{\exp(z_u^Tz_v)}{\sum_{n\in V}\exp(z_u^Tz_n)}$
Let
$L=\sum_{u\in V}\sum_{v\in N_R(u)}-\log P(v|z_u)=\sum_{u\in V}\sum_{v\in N_R(u)}-\log(\frac{\exp(z_u^Tz_v)}{\sum_{n\in V}\exp(z_u^Tz_n)})$
Optimizing random walk embeddings = Finding embeddings $z_u$ that minimizes $L$
Time complexity : $O(|V^2|)$
Solution: Negative sampling
$\log(\frac{\exp(z_u^Tz_v)}{\sum_{n\in V}\exp(z_u^Tz_n)}) \approx \log (\sigma(z_u^Tz_v)) - \sum_{i=1}^k \log(z_u^Tz_{n_i}), n_i\sim P_V$
where $\sigma(\cdot)$ is sigmoid function and $n_i$ follows random distribution over nodes
Instead of normalizing w.r.t. all nodes, just normalize against $k$ random “negative samples” $n_i$
Sample $k$ negative nodes each with probability proportional to its degree ( $k=5\sim 20$ in practice)
To minimize $L$ , we can use gradient descent (GD) or stochastic gradient descent (SGD)
How to randomly walk
Simplest idea: just run fixed-length, unbiased random walks starting from each node

4). Overview of Node2vec

Goal: embed nodes with similar network neighborhoods close in the feature space
Key observation: flexible notion of network neighborhood $N_R(u)$ of node $u$ leads to rich node embeddings
Develop biased $2^{nd}$ order random walk $R$ to trade off between local (BFS) and global (DFS) views of the network

Interpolating BFS and DFS
Two parameters:

Return parameter $p$ : return back to the previous node
In-out parameter $q$ (ratio of BFS vs DFS): moving outwards (DFS) or inwards (BFS)

Biased Random Walks
Idea:remember where the walk came from

BFS-like walk: low value of $p$
DFS-like walk: low value of $q$

Node2vec Algorithm
a) compute random walk probabilities
b) simulate $r$ random walks of length $l$ start from each node $u$
c) optimize the node2vec objective using SGD
Time Complexity: linear
Steps are individually parallelizable

5). Other Random Walk Ideas

Different kinds of biased random walks: based on node attributes/learned weights
Alternative optimization schemes: directly optimize based on 1-hop and 2-hop random walk probabilities
Network preprocessing techniques: run random walks on modified versions of the original network

4. Embedding Entire Graphs

Goal: embed a subgraph or an entire gtaph $G$

1). Approach 1

Run a standard graph embedding technique on the (sub)graph $G$ and then just sum the node embeddings in the (sub)graph $G$
$z_G=\sum_{v\in G}z_v$

2). Approach 2

Introduce a “virtual node” to represent the (sub)graph and run a standart graph embedding technique

3). Approach 3: Anonymous Walk Embeddings

States in anonymous walks correspond to the index of the first time we visited the node in a random walk (agnostic to the identity of the nodes visited), simulate anonymous walks $w_i$ of $l$ steps and record their counts then represent the graph as a probability distribution over these walks
Number of anonymous walks grows exponentially
Sampling anonymous walks
Generate independently a set of $m$ random walks
Represent the graph as a probability distribution over these walks with error of more than $\epsilon$ with probability less than $\delta$
$m=[\frac{2}{\epsilon^2}(\log(2^{\eta}-2)-\log(\delta))]$
where $\eta$ is the total number of anonymous walks of length $l$

5. How to Use Embeddings

Clustering/community detection: cluster points $z_i$
Node classification: predict label of node $i$ based on $z_i$
Link prediction: predict edge $(i. j)$ based on $(z_i, z_j)$ (where we can concatenate, average, product or take a difference between the embeddings)
Graph classification: graph embedding $z_G$ via aggregating node embeddings or anonymous random walks. Predict label based on graph embedding $z_G$

全部评论 (0)

还没有任何评论哟~

CS224W: Machine Learning with Graphs - 03 Node Embeddings

NodeEmbeddings 1\.GraphRepresnetationLearning Graphrepresnetationlearningalleviatestheneedtodofeatur...

CS224W—02 Node Embeddings

CS224W—02NodeEmbeddings NodeEmbeddings概念传统的图机器学习：给定一个输入图，抽取节点、边和图级别特征,学习一个能将这些特征映射到标签的模型像SVM，NN… 这...

GNN-CS224W: 1-2 Introduction； Traditional Methods for machine learning in Graphs

network相比其他的数据有很多特点让它难以处理： arbitrarysizeandcomplextopologicalstructure ApplicationsofGraphML tasklev...

CS244W: Machine Learning with Graphs (5) ——谱聚类

谱聚类分为三个步骤： 1.预处理：构造图的矩阵表示 2.分解：计算矩阵的特征值和特征向量，将每个节点映射到基于一个或多个特征向量的低维表示 3.分组：根据新的表示，将点分配到两个或多个集群我们已经知...

CS244W: Machine Learning with Graphs (4) ——网络中的社区结构

一、网络和社区我们通常认为网络的结构是这样的：那么是什么导致了这样的预期呢？网络中的数据流是如何流动的？不同结构的节点扮演了什么怎样不同的角色？不同的链接（长与短）又分别扮演了什么怎样不同的...

Learning to represent programs with graphs

文章目录 Issues Methodology Implementation Implementation Title: learningtorepresentprogramswithgraph Au...

图神经网络（CS224w）学习笔记3 Node Embeddings

文章目录章节前言为什么要Embedding？ Embedding的例子本章主要内容一、NodeEmbeddings:EncoderandDecoder 1.1编码器 1.1.1“浅”编码“sh...

原创 | 斯坦福Machine Learning with Graphs 学习笔记(第一讲）

作者：林夕本文长度为2900字，建议阅读9分钟本文为大家介绍图网络的基本概念、网络的应用以及图的结构。标签：机器学习目录一、WhyNetworks 二、网络的应用 2.1应用领域三、图的结...

CS244W: Machine Learning with Graphs (2) ——图的性质和随机图模型

网络的四种性质度分布（Degreedistribution）路径长度（Pathlength）聚类系数（Clusteringcoefficient）连通分量（Connectedcomponent...

AdamTechLouis's talk: Deep Learning with Knowledge Graphs

LastweekIgaveatalkatConnectedDataLondonontheapproachthatwehavedevelopedatOctaviantouseneuralnetworks...

是否确定退出登录?

CS224W: Machine Learning with Graphs - 03 Node Embeddings

Node Embeddings

1. Graph Represnetation Learning

2. Node Embeddings: Encoder and Decoder

1). “Shallow” Encoding

3. Random Walk Approaches for Node Embeddings

1). Random-walk Embeddings

2). Unsupervised Feature Learning

3). Random-walk Optimization

4). Overview of Node2vec

5). Other Random Walk Ideas

4. Embedding Entire Graphs

1). Approach 1

2). Approach 2

3). Approach 3: Anonymous Walk Embeddings

5. How to Use Embeddings

全部评论 (0)

相关文章推荐

CS224W: Machine Learning with Graphs - 03 Node Embeddings

CS224W—02 Node Embeddings

GNN-CS224W: 1-2 Introduction； Traditional Methods for machine learning in Graphs

CS244W: Machine Learning with Graphs (5) ——谱聚类

CS244W: Machine Learning with Graphs (4) ——网络中的社区结构

Learning to represent programs with graphs

图神经网络（CS224w）学习笔记3 Node Embeddings

原创 | 斯坦福Machine Learning with Graphs 学习笔记(第一讲）

CS244W: Machine Learning with Graphs (2) ——图的性质和随机图模型

AdamTechLouis's talk: Deep Learning with Knowledge Graphs