Graph Convolutional Networks (GCN) for NLP
GCN Model Overview
GCN Model Overview
- Graph \( G = (V , E) \)
- There are n nodes and m edges.
- Nodes are represented by a feature matrix \( X\) of size \(n \times d\)
- Each node \(x_v \in \mathbb{R}^{d}\)
- The structure of the graph is represented by an adjacency matrix \(A \)
- Input to an GCN:
- \(X\)
- \(A\)
- Output of the GCN: Vectorial Representation for each node of the graph
- Matrix \(Z\) of size \( n \times f \)
- Each output representation \(z_v \in \mathbb{R}^{f}\) incorporates information about its neighbourhood structure
- Multiple layers of Graph convolutions :
- \( H^{l+1} = f( H^l, A) \)
- \( H^0 = X \) and \( H^L = Z \)
GCN Model Overview
- \( f( H^l, A) = \sigma(AH^lW^l) \)
- \( \mathbf{h}_v^{l+1} = ReLU \bigg( \sum_{u \in \mathcal{N}(v)} (W^l\mathbf{h}_u^{l} + \mathbf{b}^l) \bigg) \quad , \quad \forall v \in \mathcal{V} \)
- Two problems with this:
- Node itself not taken into the computation: \( \hat{A} = A + I \)
- Rows of A dont sum to 1. Multiplying \(AH\) changes scale:
- Normalize: \( D^{-1}A \): corresponds to taking average of neighbours
- \( f( H^l, A) = \sigma(D^{-\frac{1}{2}}AD^{-\frac{1}{2}}H^lW^l) \)
GCN
By suman banerjee
GCN
- 757