Using Link Prediction Algorithms
to Analyze Dynamic Network Evolution
Dima Kagan, rami puzis, and michael fire
DaTA SCIENCE Network Science
Big Data 3Vs
CREATING MASSIVE NETWORK DATASETS
Fire, Michael, and Carlos Guestrin. "The Rise and Fall of Network Stars: Analyzing 2.5 million graphs to reveal how high-degree vertices emerge over time," Information Processing & Management (2019)
Creating Network DAta is hard
Kagan, Dima, Thomas Chesney, and Michael Fire. "Using Data Science to Understand the Film Industry's Gender Gap." arXiv preprint arXiv:1903.06469 (2019).
Dynamic networks
RESEARCH QUESTIONS
- Does the new link formation process changes in a network over time?
- Is it harder for a classifier to learn how edges are formed in a network overtime?
- Are edges becoming more random over time?
The dataset
After selecting all networks that had enough data we had:
- Co-Author 8348 networks
- Reddit Comments 4760 network
Link Prediction
Kagan, Dima, Yuval Elovichi, and Michael Fire. "Generic anomalous vertices detection utilizing a link prediction algorithm." Social Network Analysis and Mining 8.1 (2018): 27.
Topology Based
Feature Extraction
◦ For undirected graphs:
- Common Friends
- Total Friends
- Jaccard’s-Coefficent
\frac{|\Gamma(v) \cap \Gamma(u)|}{|\Gamma(v) \cup \Gamma(u)|}
|\Gamma(v) \cup \Gamma(u)|
|\Gamma(v) \cap \Gamma(u)|
|\Gamma(v)_{in}| \cap |\Gamma_{out}(u)|
\begin{cases}
1, & \text{if}\ (u,v)\in E \\
0, & \text{otherwise}
\end{cases}
◦ For directed graphs:
- Transitive Friends
- Opposite Direction Friends
Training the models
G_t = (V_t,E_t)
E_f
= E_{G_t+1} - E_{G_t}
train\text{-}positive = \{ (v,u) | v, u \in V_{G_t}, (v,u) \in E_f\}
train\text{-}negative = sample from \{ (v,u) | v, u \in V_{G_t}, (v,u) \notin E_{t+1}, \Gamma_{out}(v) > 2 , \Gamma_{out}(u) > 2 \}
IF the networks are truly scale-free how the classifier will behave?
Changes in the DC Cinematic subreddit graph over time
If it is not scale free what changes?
Changes in the DC Cinematic subreddit graph over time
Future work
-
Evaluate on Synthetic Data.
-
Investigate when networks change and what caused it
-
Discover the point when the classifier should be re-trained.
NetSci2019 -NetEval
By Dima Kagan
NetSci2019 -NetEval
- 144