Joint graph-feature embeddings using GCAEs
Sébastien Lerique, Jacobo Levy-Abitbol, Márton Karsai, Éric Fleury
IXXI, École Normale Supérieure de Lyon
... be tightly connected
... relate through similar interests
... write in similar styles
graph node2vec: \(d_n(u_i, u_j)\)
average user word2vec: \(d_w(u_i, u_j)\)
Create a task-independent representation of network + features
What is the dependency between network structure and feature structure
Plot the cost of compressing network + features down to a given dimension \(n\)
network—feature dependencies
network—feature independence
Use deep learning to create embeddings
Graph convolutional neural networks + Auto-encoders
Speculative questions we want to ask
With great datasets come great computing headaches
\(H^{(l+1)} = \sigma(H^{(l)}W^{(l)})\)
\(H^{(0)} = X\)
\(H^{(L)} = Z\)
\(H^{(l+1)} = \sigma(\color{DarkRed}{\tilde{D}^{-\frac{1}{2}}\tilde{A}\tilde{D}^{-\frac{1}{2}}}H^{(l)}W^{(l)})\)
\(H^{(0)} = X\)
\(H^{(L)} = Z\)
\(\color{DarkGreen}{\tilde{A} = A + I}\)
\(\color{DarkGreen}{\tilde{D}_{ii} = \sum_j \tilde{A}_{ij}}\)
Kipf & Welling (2016)
x
y
green
red
\(H^{(l+1)} = \sigma(H^{(l)}W^{(l)})\)
\(H^{(0)} = X\)
\(H^{(L)} = Z\)
Inspired by colah's blog
Four well-marked communities of size 10, small noise
Overlapping communities of size 12, small noise
Two feature communities in a near-clique, small noise
Five well-marked communities of size 20, moderate noise
From blog.keras.io
high dimension
high dimension
low dimension
MNIST Examples by Josef Steppan (CC-BY-SA 4.0)
60,000 training images
28x28 pixels
784 dims
784 dims
2D
From blog.keras.io
node features
embedding
GCN
node features
adjacency matrix
Socio-economic status
Language style
Topics
Socio-economic status
Language style
Topics
Compressed & combined representation of nodes + network
Kipf & Welling (2016)
Five well-marked communities of size 10, moderate label noise
a.k.a., questions we can (will be able to) ask
Explore the dependency between network structure and feature structure
Cost of compressing network + features down to a given dimension \(n\)
Task-independent representation of network + features with uncertainty
Continuous change from feature communities to network communities
Speculation
Link prediction
Community detection
Graph reconstruction
Node classification
10,312 nodes
333,983 edges
39 groups
Full dataset
200 nodes
162 edges
36 groups
Toy model
node2vec, Grover & Leskovec (2016)
Walk on triangles
Walk outwards
Dataset | # nodes | # edges |
---|---|---|
BlogCatalog | 10K | 333K |
Flickr | 80K |
5.9M |
YouTube | 1.1M | 3M |
178K | 44K |
✔
👷
👷
👷
Mutual mention network on 25% of the GMT+1/GMT+2 twittosphere in French
node2vec, Grover & Leskovec (2016)
walk back \(\propto \frac{1}{p}\)
walk out \(\propto \frac{1}{q}\)
walk in triangle \(\propto 1\)
Walk on triangles — p=100, q=100
Walk out — p=1, q=.01
Sébastien Lerique, Jacobo Levy-Abitbol, Márton Karsai, Éric Fleury
IXXI, École Normale Supérieure de Lyon