In many fields the data have a graph structure
Objective: extract features from the graph in a form suitable for machine learning algorithms
Machine learning tasks on graphs and their applications
Task: find a representation of graph vertices as vectors of (low-dimensional) space that preserve useful information. Normally, vectors are close in space if the vertices are close in the graph
graph embedding ~ representation learning
Approaches:
Graphlets
Centralities
Layout Based
Node representation as a dimensionality reduction problem with information preservation.
General idea: represent the graph as a matrix and decompose it.
Notation:
≈
d
d
|V|
|V|
\( d \ll|V| \)
is reduced to finding the smallest eigenvectors of the sparse matrix \( (I-W)^T(I-W) \)
K. Saul, T. Roweis: An Introduction to Locally Linear Embedding (2000) [pdf]
M. Belkin, P. Niyogi: Laplacian Eigenmaps for Dimensionality Reduction and Data Representation (NIPS, 2002) [pdf]
is reduced to finding the smallest eigenvectors of the normalized Laplassian
Idea: vertex representation is close if the vertices are connected
Another distance function \( distance = \frac{|Y_i - Y_j|^2}{|Y_i - Y_j|^2+\sigma^2} \)
D. Luo, C. Ding, F. Nie, H. Huang: Cauchy Graph Embedding (ICML 2011) [pdf]
The main problem is maintaining only 1st order proximity
Definitions:
First-order proximity between vertices \( i \) and \( j \) = edge weight \( W_{ij} \)
Let be the \(k\)-order closeness. Then the \( (k+1) \) order closeness between vertices \( i \) and \( j \) = the similarity measure of vectors \( s_i \) and \( s_j \) .
≈
d
d
|V|
|V|
\( d \ll|V| \)
The representations for all k are concatenated. The disadvantage is the complexity of the algorithm \( O(|V|^3) \).
Normalized transition matrix \( X^k_{i,j}=log\frac{A^k_i,j}{\sum\limits_iA^k_{i,j}}-\log\beta \)
S. Cao, W. Lu, Q. Xu: GraRep (CIKM, 2015) [link]
Take S proximity matrix instead of adjacency matrix (Katz Index, Rooted Page Rank, Common Neighbors, Adamic-Adar score)
The main disadvantages of matrix decomposition algorithms: only the 1st order closeness and/or high complexity of the algorithm
D. Zhu, et all: High-order Proximity Preserved Embedding For Dynamic Networks , KDD 2016
M. Ou, P. Cui, J. Pei, Z. Zhang, W. Zhu: Asymmetric Transitivity Preserving Graph Embedding [pdf]
authors’ code (MATLAB) [link]
Z. Zhang, P. Cui, X. Wang, J. Pei, X. Yao, W. Zhu: Arbitrary-Order Proximity Preserved Network Embedding [pdf]
authors’ code (MATLAB + Python) [link]
HOPE
AROPE
word2vec learns vector representations of words, useful in application tasks. Vectors show interesting semantic properties. For example:
Key Idea: Nodes in random walks \( \approx \) words in sentences -> use word2vec.
B. Perozzi, R. Al-Rfou, S. Skiena: DeepWalk: Online Learning of Social Representations (KDD, 2014) [pdf]
Low q - explore intra-cluster information
High q - explore inter-cluster information
Key Idea: - don't generate random walks
J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, Q. Mei: LINE: Large-scale Information Network Embedding, WWW, 2015 [pdf]
code (C++) [link]
I. Makarov, D. Kiselev1, N. Nikitinsky, L. Subelj: Survey on graph embeddings and their applications to machine learning problems on graphs [link]
J. Qiu, Y. Dong, H. Ma, J. Li, K. Wang, J. Tang: Unifying DeepWalk, LINE, PTE, and node2vec [pdf]
authors’ code (Python) [link]
Z. Zhang, P. Cui, H. Li, X. Wang, W. Zhu: Billion-scale Network Embedding with Iterative Random Projection [pdf]
authors’ code (Python) [link]
H. Chen, S. Fahad Sultan, Y. Tian, M. Chen, S. Skiena: Fast and Accurate Network Embeddings via Very Sparse Random Projection [pdf]
authors’ code (Python) [link]
D. Yang, P. Rosso1, B. Li and P. Cudre-Mauroux: Highly Efficient Graph Embeddings via Recursive Sketching [pdf]
authors’ code (C++ & Python) [link]