GCN

Graph Convolutional Networks (GCN) for NLP

GCN Model Overview

Graph \( G = (V , E) \)
There are n nodes and m edges.
Nodes are represented by a feature matrix \( X\) of size \(n \times d\)
Each node \(x_v \in \mathbb{R}^{d}\)
The structure of the graph is represented by an adjacency matrix \(A \)
Input to an GCN:
- \(X\)
- \(A\)
Output of the GCN: Vectorial Representation for each node of the graph
- Matrix \(Z\) of size \( n \times f \)
- Each output representation \(z_v \in \mathbb{R}^{f}\) incorporates information about its neighbourhood structure
Multiple layers of Graph convolutions :
- \( H^{l+1} = f( H^l, A) \)
- \( H^0 = X \) and \( H^L = Z \)

GCN Model Overview

\( f( H^l, A) = \sigma(AH^lW^l) \)
\( \mathbf{h}_v^{l+1} = ReLU \bigg( \sum_{u \in \mathcal{N}(v)} (W^l\mathbf{h}_u^{l} + \mathbf{b}^l) \bigg) \quad , \quad \forall v \in \mathcal{V} \)
Two problems with this:
- Node itself not taken into the computation: \( \hat{A} = A + I \)
- Rows of A dont sum to 1. Multiplying \(AH\) changes scale:
  - Normalize: \( D^{-1}A \): corresponds to taking average of neighbours
  - \( f( H^l, A) = \sigma(D^{-\frac{1}{2}}AD^{-\frac{1}{2}}H^lW^l) \)