Joint graph-feature embeddings using GCAEs

Sébastien Lerique, Jacobo Levy-Abitbol, Márton Karsai, Éric Fleury

IXXI, École Normale Supérieure de Lyon

### Twitter users can...

... be tightly connected

... relate through similar interests

... write in similar styles

graph node2vec: $$d_n(u_i, u_j)$$

average user word2vec: $$d_w(u_i, u_j)$$

### Questions

• Create a task-independent representation of network + features

• What is the dependency between network structure and feature structure

• Plot the cost of compressing network + features down to a given dimension $$n$$

network—feature dependencies

network—feature independence

Use deep learning to create embeddings

### A framework

Graph convolutional neural networks + Auto-encoders

### How is this framework useful

Speculative questions we want to ask

### Application and scaling

With great datasets come great computing headaches

### Graph-convolutional neural networks

$$H^{(l+1)} = \sigma(H^{(l)}W^{(l)})$$

$$H^{(0)} = X$$

$$H^{(L)} = Z$$

$$H^{(l+1)} = \sigma(\color{DarkRed}{\tilde{D}^{-\frac{1}{2}}\tilde{A}\tilde{D}^{-\frac{1}{2}}}H^{(l)}W^{(l)})$$

$$H^{(0)} = X$$

$$H^{(L)} = Z$$

$$\color{DarkGreen}{\tilde{A} = A + I}$$

$$\color{DarkGreen}{\tilde{D}_{ii} = \sum_j \tilde{A}_{ij}}$$

Kipf & Welling (2016)

### Neural networks

x

y

green

red

$$H^{(l+1)} = \sigma(H^{(l)}W^{(l)})$$

$$H^{(0)} = X$$

$$H^{(L)} = Z$$

Inspired by colah's blog

### Semi-supervised graph-convolution learning

Four well-marked communities of size 10, small noise

### More semi-supervised GCN netflix

Overlapping communities of size 12, small noise

Two feature communities in a near-clique, small noise

Five well-marked communities of size 20, moderate noise

### (Variational) Auto-encoders

• Bottleneck compression → creates embeddings
• Flexible training objectives
• Free encoder/decoder architectures

high dimension

high dimension

low dimension

### Example — auto-encoding MNIST digits

MNIST Examples by Josef Steppan (CC-BY-SA 4.0)

60,000 training images

28x28 pixels

784 dims

784 dims

2D

### GCN + Variational auto-encoders = 🎉💖🎉

node features

embedding

GCN

node features

adjacency matrix

Socio-economic status

Language style

Topics

Socio-economic status

Language style

Topics

Compressed & combined representation of nodes + network

Kipf & Welling (2016)

### GCN+VAE learning

Five well-marked communities of size 10, moderate label noise

### Applications

a.k.a., questions we can (will be able to) ask

Explore the dependency between network structure and feature structure

Cost of compressing network + features down to a given dimension $$n$$

Task-independent representation of network + features with uncertainty

Continuous change from feature communities to network communities

Speculation

Link prediction

Community detection

Graph reconstruction

Node classification

10,312 nodes

333,983 edges
39 groups

Full dataset

200 nodes

162 edges

36 groups

Toy model

### Scaling GCN

node2vec, Grover & Leskovec (2016)

Walk on triangles

Walk outwards

Dataset # nodes # edges
BlogCatalog 10K 333K
Flickr 80K
5.9M
YouTube 1.1M 3M
Twitter 178K 44K

👷

👷

👷

Mutual mention network on 25% of the GMT+1/GMT+2 twittosphere in French

### Mini-batch sampling

node2vec, Grover & Leskovec (2016)

walk back $$\propto \frac{1}{p}$$

walk out $$\propto \frac{1}{q}$$

walk in triangle $$\propto 1$$

Walk on triangles — p=100, q=100

Walk out — p=1, q=.01

### Thank you!

Sébastien Lerique, Jacobo Levy-Abitbol, Márton Karsai, Éric Fleury

IXXI, École Normale Supérieure de Lyon