week 06

Representation Learning on Graphs

Social Network Analysis

Network Embeddings

Graphs representation

In many fields the data have a graph structure

social: friendship graph in social networks, graph of scientific citations
man-made: internet, web, road networks, air communication networks
In biology: protein interactions, complex molecules.

Graph representation

supervised, semi-supervised
- node classification
  - Is the account a bot
  - Predicting user age, gender, profession in a social network
  - Predicting the function of a new protein based on its interaction with others
  - article topic prediction on the basis of citations
- link prediction
  - content recommendation in an online platform
  - forecasting drug side effects
- community detection
  - searching for users with similar interests
  - Revealing functional groups of proteins

Objective: extract features from the graph in a form suitable for machine learning algorithms

Machine learning tasks on graphs and their applications

Approaches to learning graph representations

Task: find a representation of graph vertices as vectors of (low-dimensional) space that preserve useful information. Normally, vectors are close in space if the vertices are close in the graph

graph embedding ~ representation learning

Approaches:

Naive methods;
Based on matrix decompositions;
Based on random walks;
Graph neural networks.
Other (edge probability)

Naive Approaches

Simple graph representations

Graphlets
Centralities
Layout Based

Matrix Decomposition

Node representation as a dimensionality reduction problem with information preservation.

General idea: represent the graph as a matrix and decompose it.

SVD example

Notation:

\( G(V,E) \) - graph with vertices \( V \) and edges \( E \)
\( W \) - adjacency matrix with weights
\( D \) - diagonal degree matrix
\( L = D - W \) - Laplacian of the graph
\( Y_i \) is the vector representation of a vertex \( i \) of dimension \( d \ll |V| \)
\( I \) is a unary matrix
\( \phi(Y) \) - loss function

≈

|V|

\( d \ll|V| \)

Locally Linear Embedding

Y_i \approx \sum_j W_{i j} Y j

\phi(Y)=\sum_i\left\|Y_i-\sum W i j Y j\right\|^2

is reduced to finding the smallest eigenvectors of the sparse matrix \( (I-W)^T(I-W) \)

K. Saul, T. Roweis: An Introduction to Locally Linear Embedding (2000) [pdf]

Laplacian Eigenmaps

M. Belkin, P. Niyogi: Laplacian Eigenmaps for Dimensionality Reduction and Data Representation (NIPS, 2002) [pdf]

is reduced to finding the smallest eigenvectors of the normalized Laplassian

Y^T D Y=I

Idea: vertex representation is close if the vertices are connected

\phi(Y)=\frac{1}{2} \sum_{i, j}\left\|Y_i-Y j\right\|_2^2 W_{i j}=\operatorname{Tr}\left(Y^T L Y\right)

L_{n o r m}=D^{-1 / 2} L D^{-1 / 2}

Cauchy Graph Embeddings

Another distance function \( distance = \frac{|Y_i - Y_j|^2}{|Y_i - Y_j|^2+\sigma^2} \)

\phi(Y)=\frac{1}{2} \sum_{i, j} \frac{W_{i, j}}{\left|Y_i-Y_j\right|^2+\sigma^2}

D. Luo, C. Ding, F. Nie, H. Huang: Cauchy Graph Embedding (ICML 2011) [pdf]

Matrix Decomposition

Naive method problems

The main problem is maintaining only 1st order proximity

Definitions:
First-order proximity between vertices \( i \) and \( j \) = edge weight \( W_{ij} \)
Let be the \(k\)-order closeness. Then the \( (k+1) \) order closeness between vertices \( i \) and \( j \) = the similarity measure of vectors \( s_i \) and \( s_j \) .

≈

|V|

\( d \ll|V| \)

GraRep (CIKM, 2015)

The representations for all k are concatenated. The disadvantage is the complexity of the algorithm \( O(|V|^3) \).

Normalized transition matrix \( X^k_{i,j}=log\frac{A^k_i,j}{\sum\limits_iA^k_{i,j}}-\log\beta \)

\phi(Y)=\left\|X^k-Y_s^k Y_t^{k T}\right\|_F^2

S. Cao, W. Lu, Q. Xu: GraRep (CIKM, 2015) [link]

HOPE

Take S proximity matrix instead of adjacency matrix (Katz Index, Rooted Page Rank, Common Neighbors, Adamic-Adar score)

\phi(Y)=\left\|S-Y_s Y_t^T\right\|_F^2, \text { comp. complexity } O\left(|E| d^2\right)\phi(Y)=\left\|X^k-Y_s^k Y_t^{k T}\right\|_F^2

The main disadvantages of matrix decomposition algorithms: only the 1st order closeness and/or high complexity of the algorithm

D. Zhu, et all: High-order Proximity Preserved Embedding For Dynamic Networks , KDD 2016

AROPE

M. Ou, P. Cui, J. Pei, Z. Zhang, W. Zhu: Asymmetric Transitivity Preserving Graph Embedding [pdf]

authors’ code (MATLAB) [link]

Z. Zhang, P. Cui, X. Wang, J. Pei, X. Yao, W. Zhu: Arbitrary-Order Proximity Preserved Network Embedding [pdf]

authors’ code (MATLAB + Python) [link]

HOPE

AROPE

Random Walk

Word2vec

word2vec learns vector representations of words, useful in application tasks. Vectors show interesting semantic properties. For example:

king: male = queen: female ⇒
king - man + woman = queen

Word2vec

word2vec explanation

word2vec seminar

Random walks

Key Idea: Nodes in random walks \( \approx \) words in sentences -> use word2vec.

Deepwalk

Parameters
- In practical tasks \( w = 10 \), \( \gamma=80 \), \( t=80 \)
- newer change \( w \)
- If you lower \( w \), increase \( \gamma \), \( t \)

B. Perozzi, R. Al-Rfou, S. Skiena: DeepWalk: Online Learning of Social Representations (KDD, 2014) [pdf]

authors’ code (Python) [link], C++ code [link]

Node2vec

Low q - explore intra-cluster information
High q - explore inter-cluster information

A. Grover, J. Leskovec: Scalable Feature Learning for Networks [pdf], authors’ code (Python) [link] C++ code [link]

Large-scale Information Network Embedding (LINE)

Key Idea: - don't generate random walks

J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, Q. Mei: LINE: Large-scale Information Network Embedding, WWW, 2015 [pdf]

code (C++) [link]

p_1(v_i, v_j ) = \dfrac{1}{1 + \exp(−u^T_i u_j)}

p_2(v_i| v_j ) = \dfrac{\exp(−u^{'T}_i u_j)}{\sum^{|V|}_{k=1}\exp(−u^{'T}_k u_i)}

O_1 = - \sum_{(i,j) \in E} w_{ij} \log{p_1(v_i,v_j)}

O_2 = - \sum_{(i,j) \in E} w_{ij} \log{p_2(v_i|v_j)}

\hat{p_1}(i, j ) = \dfrac{w_{ij}}{\sum_{(i,j)\in E^{w_ij}}}

\hat{p_1}(v_j| v_i ) = \dfrac{w_{ij}}{d_i}

d_i = \sum_{k \in N(i)}w_{ik}

VERSE

A. Tsitsulin, D. Mottin, P. Karras, E. Müller: Versatile Graph Embeddings from Similarity Measures, WWW, 2018 [pdf], authors’ code (C++) [link]

Useful Links

I. Makarov, D. Kiselev1, N. Nikitinsky, L. Subelj: Survey on graph embeddings and their applications to machine learning problems on graphs [link]

J. Qiu, Y. Dong, H. Ma, J. Li, K. Wang, J. Tang: Unifying DeepWalk, LINE, PTE, and node2vec [pdf]

authors’ code (Python) [link]

NetMF

Z. Zhang, P. Cui, H. Li, X. Wang, W. Zhu: Billion-scale Network Embedding with Iterative Random Projection [pdf]

authors’ code (Python) [link]

RandNE

H. Chen, S. Fahad Sultan, Y. Tian, M. Chen, S. Skiena: Fast and Accurate Network Embeddings via Very Sparse Random Projection [pdf]

authors’ code (Python) [link]

FastRP

D. Yang, P. Rosso1, B. Li and P. Cudre-Mauroux: Highly Efficient Graph Embeddings via Recursive Sketching [pdf]

authors’ code (C++ & Python) [link]

NodeSketch

Copy of deck

By karpovilia

week 06

Representation Learning on Graphs

Social Network Analysis

Network Embeddings

Graphs representation

Graph representation

Approaches to learning graph representations

Naive Approaches

Simple graph representations

Matrix Decomposition

Locally Linear Embedding

Laplacian Eigenmaps

Cauchy Graph Embeddings

Matrix Decomposition

Naive method problems

GraRep (CIKM, 2015)

HOPE

AROPE

Random Walk

Word2vec

Word2vec

Random walks

Deepwalk

Node2vec

Large-scale Information Network Embedding (LINE)

VERSE

Useful Links

Copy of deck

More from karpovilia