Graph Convolutional Networks (GCN) for NLP
GCN Model Overview
image/svg+xml
image/svg+xml
GCN Model Overview
Graph \( G = (V , E) \)
There are n nodes and m edges.
Nodes are represented by a feature matrix \( X\) of size \(n \times d\)
Each node \(x_v \in \mathbb{R}^{d}\)
The structure of the graph is represented by an adjacency matrix \(A \)
Input to an GCN:
\(X\)
\(A\)
Output of the GCN: Vectorial Representation for each node of the graph
Matrix \(Z\) of size \( n \times f \)
Each output representation \(z_v \in \mathbb{R}^{f}\) incorporates information about its neighbourhood structure
Multiple layers of Graph convolutions :
\( H^{l+1} = f( H^l, A) \)
\( H^0 = X \) and \( H^L = Z \)
GCN Model Overview
\( f( H^l, A) = \sigma(AH^lW^l) \)
\( \mathbf{h}_v^{l+1} = ReLU \bigg( \sum_{u \in \mathcal{N}(v)} (W^l\mathbf{h}_u^{l} + \mathbf{b}^l) \bigg) \quad , \quad \forall v \in \mathcal{V} \)
Two problems with this:
Node itself not taken into the computation: \( \hat{A} = A + I \)
Rows of A dont sum to 1. Multiplying \(AH\) changes scale:
Normalize: \( D^{-1}A \): corresponds to taking average of neighbours
\( f( H^l, A) = \sigma(D^{-\frac{1}{2}}AD^{-\frac{1}{2}}H^lW^l) \)
Made with Slides.com