Graph Convolutional Networks (GCN) for NLP

GCN Model Overview

image/svg+xml
image/svg+xml

GCN Model Overview

  • Graph \( G = (V , E) \)
  • There are n nodes and m edges.
  • Nodes are represented by a feature matrix \( X\) of size \(n \times d\)
  • Each node \(x_v \in \mathbb{R}^{d}\) 
  • The structure of the graph is represented by an adjacency matrix \(A \)
  • Input to an GCN:
    • \(X\)
    • \(A\)
  • Output of the GCN:  Vectorial Representation for each node of the graph
    • Matrix \(Z\) of size \( n \times f \)
    • Each output representation \(z_v \in \mathbb{R}^{f}\) incorporates information about its neighbourhood structure
  • Multiple layers of Graph convolutions :
    • \( H^{l+1} = f( H^l, A) \)
    • \( H^0 = X \) and \( H^L = Z \) 

GCN Model Overview

  • \( f( H^l, A)  = \sigma(AH^lW^l) \)
  • \( \mathbf{h}_v^{l+1} = ReLU \bigg( \sum_{u \in \mathcal{N}(v)} (W^l\mathbf{h}_u^{l} + \mathbf{b}^l) \bigg) \quad , \quad  \forall v \in \mathcal{V} \)
  • Two problems with this:
    • Node itself not taken into the computation: \( \hat{A} = A + I \)
    • Rows of A dont sum to 1. Multiplying \(AH\) changes scale:
      • Normalize: \( D^{-1}A \): corresponds to taking average of neighbours
      • \( f( H^l, A)  = \sigma(D^{-\frac{1}{2}}AD^{-\frac{1}{2}}H^lW^l) \)
 
Made with Slides.com