Using Link Prediction Algorithms

to Analyze Dynamic Network Evolution

Dima Kagan, rami puzis, and michael fire

     DaTA SCIENCE           Network Science

 

Big Data 3Vs

CREATING MASSIVE NETWORK DATASETS

Fire, Michael, and Carlos Guestrin. "The Rise and Fall of Network Stars: Analyzing 2.5 million graphs to reveal how high-degree vertices emerge over time," Information Processing & Management (2019)

Creating Network DAta is hard

Kagan, Dima, Thomas Chesney, and Michael Fire. "Using Data Science to Understand the Film Industry's Gender Gap." arXiv preprint arXiv:1903.06469 (2019).

Dynamic networks

RESEARCH QUESTIONS

  • Does the new link formation process changes in a network over time?
     
  • Is it harder for a classifier to learn how edges are formed in a network overtime?
     
  • Are edges becoming more random over time?

The dataset

After selecting all networks that had enough data we had: 

  • Co-Author 8348 networks
  • Reddit Comments 4760 network
 

Link Prediction

image/svg+xml 1 2 3 4 5 6 7 9 8 10 11 0.94
image/svg+xml 1 2 3 4 5 6 7 9 8 10 11 0.94 0.84 0.56 0.12 0.22 0.44 0.16 0.32 0.91 0.72 0.59 0.14 0.23

Kagan, Dima, Yuval Elovichi, and Michael Fire. "Generic anomalous vertices detection utilizing a link prediction algorithm." Social Network Analysis and Mining 8.1 (2018): 27.

Topology Based

Feature Extraction

◦ For undirected graphs:

  • Common Friends
  • Total Friends 
  • Jaccard’s-Coefficent 

 

     

    \frac{|\Gamma(v) \cap \Gamma(u)|}{|\Gamma(v) \cup \Gamma(u)|}
    |\Gamma(v) \cup \Gamma(u)|
    |\Gamma(v) \cap \Gamma(u)|
    |\Gamma(v)_{in}| \cap |\Gamma_{out}(u)|
    \begin{cases} 1, & \text{if}\ (u,v)\in E \\ 0, & \text{otherwise} \end{cases}

      ◦ For directed graphs:

    • Transitive Friends
    • Opposite Direction Friends

    Training the models

    G_t = (V_t,E_t)
    E_f = E_{G_t+1} - E_{G_t}
    train\text{-}positive = \{ (v,u) | v, u \in V_{G_t}, (v,u) \in E_f\}
    train\text{-}negative = sample from \{ (v,u) | v, u \in V_{G_t}, (v,u) \notin E_{t+1}, \Gamma_{out}(v) > 2 , \Gamma_{out}(u) > 2 \}

    IF the networks are truly scale-free how the classifier will behave?

    Changes in the DC Cinematic subreddit graph over time

    If it is not scale free what changes?

    Changes in the DC Cinematic subreddit graph over time

    Future work

    • Evaluate on Synthetic Data.

    • Investigate when networks change and what caused it

    • Discover the point when the classifier should be re-trained.

    NetSci2019 -NetEval

    By Dima Kagan

    NetSci2019 -NetEval

    • 144