Malicious Profile Identification in Online Social Networks
Dima Kagan
Supervisors: MICHAEL FIRE, YUVAL ELOVICI
Complex Networks
Related Work
- Reputation based filtering [Golbeck and Hendler].
- Topoplogy based identification [Fire et al.].
- Graph centrality measure based spammer identification [DeBarr and Wechsler].
- Spammers detection in social networks by using “honey-profiles" [Stringhini et al.].
- Clustering groups of accounts that act similarly at around the same time for a sustained period of time [Cao et al.].
Link Prediction
+
Crowd Wisdom
Supervised Fake Profile
Identification in Online Social Networks
- Fake profiles dataset - Recommended restricted links set + All unrestricted links set.
- Friends restriction dataset - Alphabetically restricted links set + All unrestricted links set.
- All links dataset - contains all the links.
Collected Datasets
Collected Data
Users | Restricted | Unrestricted | |
---|---|---|---|
Fake-Profiles | 434 | 2,860 | 138,286 |
Friends Restrictions | 355 | 6,145 | 138,286 |
All Links | 527 | 9,005 | 138,286 |
Features
Labeling Data is Hard
Unsupervised Anomaly Detection in Graphs Utilizing a Link Prediction Algorithm
Malicious Users Tend to Connect to Other Profiles Randomly
Topology Based
Feature Extraction
16 feautres
for directed
graphs
8 feautres for
undirected
graphs
◦ For undirected graphs:
- Common Friends
- Total Friends
- Jaccard’s-Coefficent
\frac{|\Gamma(v) \cap \Gamma(u)|}{|\Gamma(v) \cup \Gamma(u)|}
∣Γ(v)∪Γ(u)∣∣Γ(v)∩Γ(u)∣
|\Gamma(v) \cup \Gamma(u)|
∣Γ(v)∪Γ(u)∣
|\Gamma(v) \cap \Gamma(u)|
∣Γ(v)∩Γ(u)∣
|\Gamma(v)_{in}| \cap |\Gamma_{out}(u)|
∣Γ(v)in∣∩∣Γout(u)∣
\begin{cases}
1, & \text{if}\ (u,v)\in E \\
0, & \text{otherwise}
\end{cases}
{1,0,if (u,v)∈Eotherwise
◦ For directed graphs:
- Transitive Friends
- Opposite Direction Friends
Link Classification
Aggregation of The Results
\sum_{}
∑
Meta Feature Exteraction
AbnormalityVertexProbability(v) := \frac{1}{|\Gamma(v)|}\sum\nolimits_{u \in \Gamma(v)}p(v,u)
AbnormalityVertexProbability(v):=∣Γ(v)∣1∑u∈Γ(v)p(v,u)
We extracted 9 features
- - the confidence that an edge is fake.
p(v,u)
p(v,u)
Datasets
Fully Simulated Networks
Semi Simulated Networks
Real World Networks
Kids Friendship Network
AUC - 0.93
AUC−0.93
TPR - 0.91
TPR−0.91
FPR- 0.15
FPR−0.15
https://github.com/Kagandi/anomalous-vertices-detection
Questions?
Thesis
By Dima Kagan
Thesis
- 152