Malicious Profile Identification in Online Social Networks
Dima Kagan
Supervisors: MICHAEL FIRE, YUVAL ELOVICI
Complex Networks
Related Work
- Reputation based filtering [Golbeck and Hendler].
- Topoplogy based identification [Fire et al.].
- Graph centrality measure based spammer identification [DeBarr and Wechsler].
- Spammers detection in social networks by using “honey-profiles" [Stringhini et al.].
- Clustering groups of accounts that act similarly at around the same time for a sustained period of time [Cao et al.].
Link Prediction
Crowd Wisdom
Supervised Fake Profile
Identification in Online Social Networks
Facebook App
connection strength heuristic
Browser addon
Architecture
Collected Data - Facebook App
- Are-Family(u,v)
- Common-Friends(u,v)
- Common-Groups-Number(u,v)
- Common-Posts-Number(u,v)
- Common-Chat-Messages(u,v)
- Tagged-Photos-Number(u,v)
- Tagged-Videos-Number(u,v)
- Friends-Number(u)
- Friends-Number(v)
COLLECTED DATA - ADDON
- Installed-Application-Number
- Default-Privacy-Settings
- Lookup
- Share-Address
- Send-Messages
- Receive-Friend-Requests
- Tag-Suggestions
- View-Birthday
Fake profiles dataset Recommended restricted links set + All unrestricted links set.
ML Datasets
Friends restriction dataset Alphabetically restricted links set + All unrestricted links set.
All links dataset
Contains all the links.
Users | Restricted | Unrestricted | |
---|---|---|---|
Fake-Profiles | 434 | 2,860 | 138,286 |
Friends Restrictions | 355 | 6,145 | 138,286 |
All Links | 527 | 9,005 | 138,286 |
COLLECTED DATA
Additional Features
- Common-Groups-Ratio(u,v)
- Common-Posts-Ratio(u,v)
- Common-Chat-Messages-Ratio(u,v)
- Common-Photos-Ratio(u,v)
- Common-Videos-Ratio(u,v)
- Is-Friend-Profile-Private(v)
- Jaccard's-Coefficient(u,v)
Classifier | Measure | Fake Profiles | Friends Restriction | All Links |
---|---|---|---|---|
OneR | AUC | 0.861 | 0.511 | 0.608 |
OneR | F-Measure | 0.867 | 0.531 | 0.616 |
OneR | False-Positive | 0.179 | 0.532 | 0.414 |
OneR | True-Positive | 0.902 | 0.554 | 0.623 |
J48 | AUC | 0.925 | 0.684 | 0.72 |
J48 | F-Measure | 0.885 | 0.668 | 0.659 |
J48 | False-Positive | 0.179 | 0.498 | 0.321 |
J48 | True-Positive | 0.937 | 0.754 | 0.654 |
IBK (K=10) | AUC | 0.833 | 0.587 | 0.545 |
IBK (K=10) | F-Measure | 0.744 | 0.49 | 0.637 |
IBK (K=10) | False-Positive | 0.174 | 0.289 | 0.749 |
IBK (K=10) | True-Positive | 0.696 | 0.419 | 0.817 |
Naive-Bayes | AUC | 0.902 | 0.73 | 0.75 |
Naive-Bayes | F-Measure | 0.833 | 0.677 | 0.675 |
Naive-Bayes | False-Positive | 0.373 | 0.403 | 0.3 |
Naive-Bayes | True-Positive | 0.979 | 0.717 | 0.662 |
Bagging | AUC | 0.946 | 0.698 | 0.728 |
Bagging | F-Measure | 0.89 | 0.645 | 0.657 |
Bagging | False-Positive | 0.171 | 0.403 | 0.312 |
Bagging | True-Positive | 0.941 | 0.671 | 0.643 |
AdaBoostM1 | AUC | 0.937 | 0.698 | 0.728 |
AdaBoostM1 | F-Measure | 0.882 | 0.645 | 0.657 |
AdaBoostM1 | False-Positive | 0.163 | 0.403 | 0.312 |
AdaBoostM1 | True-Positive | 0.941 | 0.671 | 0.643 |
Rotation-Forest | AUC | 0.948 | 0.79 | 0.778 |
Rotation-Forest | F-Measure | 0.897 | 0.719 | 0.696 |
Rotation-Forest | False-Positive | 0.158 | 0.336 | 0.275 |
Rotation-Forest | True-Positive | 0.941 | 0.75 | 0.681 |
Random-Forest | AUC | 0.933 | 0.706 | 0.716 |
Random-Forest | F-Measure | 0.858 | 0.613 | 0.663 |
Random-Forest | False-Positive | 0.14 | 0.278 | 0.369 |
Random-Forest | True-Positive | 0.857 | 0.565 | 0.679 |
P@K
average users’ precision@k
Information gain
Applications Installation and Removal Analysis
Application DAta
-
Hashed User Id
-
Installed Application Number - the number of installed Facebook applications on the user's Facebook account,
-
Date - the date when the information was collected.
T-Test
Null hypothesis:
Two Sample t-test:
- Add-on Users (µ = 0.236; stdev= 0.12)
- Regular Users (µ = -0.19; stdev= 0.05)
T-test Results:
- (t = 25.936; p-value < 2.2e-16)
Regular Users
Addon Users
Labeling Data is Hard
Unsupervised Anomaly Detection in Graphs Utilizing a Link Prediction Algorithm
Malicious Users Tend to Connect to Other Profiles Randomly
Topology Based
Feature Extraction
16 feautres
for directed
graphs
8 feautres for
undirected
graphs
◦ For undirected graphs:
- Common Friends
- Total Friends
- Jaccard’s-Coefficent
◦ For directed graphs:
- Transitive Friends
- Opposite Direction Friends
Link Classification
Aggregation of The Results
Meta Feature Exteraction
We extracted 7 features
- - the confidence that an edge is fake.
Meta Feature Exteraction
outline
Datasets
Network | Is Directed | Vertices Number | Links Number | Date | Labeled |
---|---|---|---|---|---|
Academia | Yes | 200,169 | 1,389,063 | 2011 | No |
Anybeat | Yes | 12,645 | 67,053 | 2011 | No |
ArXiv HEP-PH | No | 34,546 | 421,578 | 2003 | No |
CLASS OF 1880/81 | Yes | 53 | 179 | 1881 | Yes |
DBLP | No | 1,665,850 | 13,504,952 | 2016 | No |
Google+ | Yes | 107,614 | 13,673,453 | 2012 | No |
Orkut | No | 3,072,441 | 117,185,083 | 2012 | No |
Yes | 5,384,160 | 16,011,443 | 2012 | Yes | |
No | 1,053,754 | 2,161,968 | 2012 | No | |
Yelp | No | 249,443 | 3,563,818 | 2016 | No |
Fully Simulated Networks
AUC | TPR | FPR | Precision | |
---|---|---|---|---|
Simulation 1 (Arxiv HEP-PH) | 0.991 | 0.889 | 0.011 | 0.904 |
Simulation 2 (DBLP) | 0.997 | 0.994 | 0.064 | 0.993 |
Simulation 3 (Yelp) | 0.993 | 0.917 | 0.007 | 0.937 |
Semi Simulated Networks
AUC | TPR | FPR | Precision | |
---|---|---|---|---|
Academia | 0.999 | 0.998 | 0.000 | 0.997 |
Anybeat | 1.000 | 0.996 | 0.001 | 0.996 |
Arxiv HEP-PH | 0.997 | 0.953 | 0.004 | 0.965 |
DBLP | 0.997 | 0.940 | 0.005 | 0.995 |
Flixster | 0.992 | 0.990 | 0.092 | 0.990 |
Google+ | 1.000 | 0.999 | 0.000 | 0.999 |
0.999 | 0.955 | 0.005 | 0.951 | |
Yelp | 0.996 | 0.941 | 0.005 | 0.958 |
Real World Networks
Kids Friendship Network
Information gain
https://github.com/Kagandi/anomalous-vertices-detection
Publications
- Michael Fire, Dima Kagan, Aviad Elishar, and Yuval Elovici, “Social Privacy Protector - Protecting Users’ Privacy in Social Networks”, The Second International Conference on Social Eco-Informatics (SOTICS), Venice, Italy, October, 2012 (Acceptance Rate: 28%).
- Dima Kagan, Michael Fire, Aviad Elishar, and Yuval Elovici, “Facebook Applications’ Installation and Removal: A Temporal Analysis”, The Third International Conference on Social Eco-Informatics (SOTICS), Lisbon, Portugal, October, 2013 (Acceptance Rate: 29%).
- Michael Fire, Dima Kagan, Aviad Elishar, and Yuval Elovici, “Friends or Foe? Fake Profile Identification in Online Social Networks”, Journal of Social Network Analysis and Mining (SNAM), Volume 4, 2014”.
Publications
- Michael Fire, Dima Kagan, Aviad Elishar, and Yuval Elovici, “Fake profile identification: Making social networks safer (Poster)”, WRF Perfect Pitch Session, 2016 (winner of the 2016 Best Commercialization/Translation Potential prize).
- Dima Kagan, Michael Fire, and Yuval Elovici, "Finding a needle in a haystack: detecting outliers in complex networks", NetSci-X, January 2017.
- According to the study presented we submitted the following patent request. Michael Fire, Dima Kagan, Aviad Elishar, and Yuval Elovici, Method for Protecting User Privacy in Social Networks” (pending patent registration no. 13/688,276).
Questions?
Thesis long
By Dima Kagan
Thesis long
- 176