Advertisement

CS224W: Machine Learning with Graphs - 03 Node Embeddings

阅读量:

Node Embeddings

1. Graph Represnetation Learning

Graph represnetation learning alleviates the need to do feature engineering every single time (automatically learn the features)
Goal : efficient task-independent feature learning for machine learning with graphs
Why embedding?

  • Similarity of embeddings between nodes indicates their similarity in the netwrok
  • Encode network information
  • Potantially used for many downstream predictions (node classification, link prediction, graph prediction, anomalous node detection, clustering…)

2. Node Embeddings: Encoder and Decoder

Goal : encode nodes so that similarity in the embedding space approximates similarity in the graph
a) Encoder ENC maps from nodes to embeddings (a low-dimensional vector)
b) Define a node similarity function (i.e., a measure of similarity in the original network)
c) Decoder DEC maps from embeddings to the similarity score
d) Optimize the parameters of the encoder so that similarity(u, v)\approx z_v^Tz_u

1). “Shallow” Encoding

Simplest encoding approach: encoder is just an embedding-lookup so each node is assigned a unique embedding vector
\text{ENV}(v)=z_v=Z \cdot v
where Z is matrix and each column is a node embedding and v is an indicator vector with all zeroes excepy a one in column indicating node v
Methods : DeepWalk, node2vec

3. Random Walk Approaches for Node Embeddings

  • Vector z_u is the embedding of node u
  • Probability P(v|z_u) is the (predicted) probability of visiting node v on random walks starting from node u
  • Random walk : given a graph and a starting point, we select one of its neighbors at random and move to this neighbor; then we select a neighbor of this point at random and move to it, etc. The (random) sequence of points visited this way is a random walk on the graph.
1). Random-walk Embeddings

z_u^Tz_v \approx \text{probability that \textit{u} and \textit{v} co-occur on a random walk over the graph}

  • Estimate probability P_R(v|u) of visiting node v on a random walk starting from node u using the random walk strategy R
  • Optimize embeddings to encode these random walk statistics

Why random walks?

  • Expressivity : flexible stochastic definition of node similarity that incorporates both local and higher-order neighborhood information (If a random walk starting from node u visits v with high probability, u and v are similar)
  • Efficiency : do not need to consider all node pairs when training; only need to consider pairs that co-occur on random walks
2). Unsupervised Feature Learning

Intuition : find embedding of nodes in d-dimensional space that preserves similarity
Idea : learn node embedding such that nearby nodes are close together in the network
N_R(u): neighborhood of u obtained by the strategy R
Goal: learn a mapping f: u \to \mathbb{R}_d: f(u) = z_u
Log-likelihood objective:
\underset{f}{max}\sum_{u\in V} log P(N_R(u)|z_u)
Given node u, we want to learn feature represnetations that are predictive of the nodes in its random walk neighborhood N_R(u)

3). Random-walk Optimization

a) Run short fixed-length random walks starting from each node u in the graph using the random walk strategy R
b) For each node u, collect N_R(u), the multiset of nodes visited on random walks starting from u
c) Optimize embeddings
\underset{f}{max}\sum_{u\in V} \log P(N_R(u)|z_u)
Parameterize P(v|z_u) using softmax
P(v|z_u)=\frac{\exp(z_u^Tz_v)}{\sum_{n\in V}\exp(z_u^Tz_n)}
Let
L=\sum_{u\in V}\sum_{v\in N_R(u)}-\log P(v|z_u)=\sum_{u\in V}\sum_{v\in N_R(u)}-\log(\frac{\exp(z_u^Tz_v)}{\sum_{n\in V}\exp(z_u^Tz_n)})
Optimizing random walk embeddings = Finding embeddings z_u that minimizes L
Time complexity : O(|V^2|)
Solution: Negative sampling
\log(\frac{\exp(z_u^Tz_v)}{\sum_{n\in V}\exp(z_u^Tz_n)}) \approx \log (\sigma(z_u^Tz_v)) - \sum_{i=1}^k \log(z_u^Tz_{n_i}), n_i\sim P_V
where \sigma(\cdot) is sigmoid function and n_i follows random distribution over nodes
Instead of normalizing w.r.t. all nodes, just normalize against k random “negative samples” n_i
Sample k negative nodes each with probability proportional to its degree (k=5\sim 20 in practice)
To minimize L, we can use gradient descent (GD) or stochastic gradient descent (SGD)
How to randomly walk
Simplest idea: just run fixed-length, unbiased random walks starting from each node

4). Overview of Node2vec
  • Goal: embed nodes with similar network neighborhoods close in the feature space
  • Key observation: flexible notion of network neighborhood N_R(u) of node u leads to rich node embeddings
  • Develop biased 2^{nd} order random walk R to trade off between local (BFS) and global (DFS) views of the network

Interpolating BFS and DFS
Two parameters:

  • Return parameter p: return back to the previous node
  • In-out parameter q (ratio of BFS vs DFS): moving outwards (DFS) or inwards (BFS)

Biased Random Walks
Idea:remember where the walk came from

  • BFS-like walk: low value of p
  • DFS-like walk: low value of q

Node2vec Algorithm
a) compute random walk probabilities
b) simulate r random walks of length l start from each node u
c) optimize the node2vec objective using SGD
Time Complexity: linear
Steps are individually parallelizable

5). Other Random Walk Ideas
  • Different kinds of biased random walks: based on node attributes/learned weights
  • Alternative optimization schemes: directly optimize based on 1-hop and 2-hop random walk probabilities
  • Network preprocessing techniques: run random walks on modified versions of the original network

4. Embedding Entire Graphs

Goal: embed a subgraph or an entire gtaph G

1). Approach 1

Run a standard graph embedding technique on the (sub)graph G and then just sum the node embeddings in the (sub)graph G
z_G=\sum_{v\in G}z_v

2). Approach 2

Introduce a “virtual node” to represent the (sub)graph and run a standart graph embedding technique

3). Approach 3: Anonymous Walk Embeddings
  • States in anonymous walks correspond to the index of the first time we visited the node in a random walk (agnostic to the identity of the nodes visited), simulate anonymous walks w_i of l steps and record their counts then represent the graph as a probability distribution over these walks
  • Number of anonymous walks grows exponentially
  • Sampling anonymous walks
    Generate independently a set of m random walks
    Represent the graph as a probability distribution over these walks with error of more than \epsilon with probability less than \delta
    m=[\frac{2}{\epsilon^2}(\log(2^{\eta}-2)-\log(\delta))]
    where \eta is the total number of anonymous walks of length l

5. How to Use Embeddings

  • Clustering/community detection: cluster points z_i
  • Node classification: predict label of node i based on z_i
  • Link prediction: predict edge (i. j) based on (z_i, z_j) (where we can concatenate, average, product or take a difference between the embeddings)
  • Graph classification: graph embedding z_G via aggregating node embeddings or anonymous random walks. Predict label based on graph embedding z_G

全部评论 (0)

还没有任何评论哟~