Malicious Profile Identification in Online Social Networks

Dima Kagan

Supervisors: MICHAEL FIRE,  YUVAL ELOVICI

Complex Networks

Related Work

  • Reputation based filtering  [Golbeck and Hendler].
  • Topoplogy based identification [Fire et al.].
  •  Graph centrality measure based spammer identification [DeBarr and Wechsler].
  • Spammers  detection in social networks by using “honey-profiles" [Stringhini et al.].
  • Clustering  groups of accounts that act similarly at around the same time for a sustained period of time [Cao et al.].

Link Prediction

 

 

 

Crowd Wisdom

Supervised Fake Profile
Identification in Online Social Networks

Facebook App

\begin{aligned} CS(u,v)= & Common-Friends(u,v) \\ & +Common-Chat-Messages(u,v) \\ & +2\cdot Common-Groups-Number(u,v) \\ & +2\cdot Common-Posts-Number(u,v) \\ & +2\cdot Tagged-Photos-Number(u,v) \\ & +2\cdot Tagged-Videos-Number(u,v) \\ & +1000\cdot Are-Family(u,v) \end{aligned}

connection strength  heuristic

Browser addon

Architecture

Collected Data - Facebook App

  • Are-Family(u,v)
  • Common-Friends(u,v)
  • Common-Groups-Number(u,v)
  • Common-Posts-Number(u,v)
  • Common-Chat-Messages(u,v)
  • Tagged-Photos-Number(u,v)
  • Tagged-Videos-Number(u,v)
  • Friends-Number(u)
  • Friends-Number(v)

COLLECTED DATA - ADDON

  • Installed-Application-Number
  • Default-Privacy-Settings
  • Lookup
  • Share-Address
  • Send-Messages
  • Receive-Friend-Requests
  • Tag-Suggestions
  • View-Birthday

Fake profiles dataset Recommended restricted links set + All unrestricted links set.​

ML Datasets

Friends restriction dataset     Alphabetically restricted links set + All unrestricted links set.

All links dataset 

 Contains all the links.

Users Restricted Unrestricted
Fake-Profiles 434 2,860 138,286
​Friends Restrictions 355 6,145 138,286
All Links 527 9,005 138,286

COLLECTED DATA

Additional Features

  • Common-Groups-Ratio(u,v)
  • Common-Posts-Ratio(u,v)
  • Common-Chat-Messages-Ratio(u,v)
  • Common-Photos-Ratio(u,v)
  • Common-Videos-Ratio(u,v)
  • Is-Friend-Profile-Private(v)
  • Jaccard's-Coefficient(u,v)
Classifier Measure Fake Profiles Friends Restriction All Links
OneR AUC 0.861 0.511 0.608
OneR F-Measure 0.867 0.531 0.616
OneR False-Positive 0.179 0.532 0.414
OneR True-Positive 0.902 0.554 0.623
J48 AUC 0.925 0.684 0.72
J48 F-Measure 0.885 0.668 0.659
J48 False-Positive 0.179 0.498 0.321
J48 True-Positive 0.937 0.754 0.654
IBK (K=10) AUC 0.833 0.587 0.545
IBK (K=10) F-Measure 0.744 0.49 0.637
IBK (K=10) False-Positive 0.174 0.289 0.749
IBK (K=10) True-Positive 0.696 0.419 0.817
Naive-Bayes AUC 0.902 0.73 0.75
Naive-Bayes F-Measure 0.833 0.677 0.675
Naive-Bayes False-Positive 0.373 0.403 0.3
Naive-Bayes True-Positive 0.979 0.717 0.662
Bagging AUC 0.946 0.698 0.728
Bagging F-Measure 0.89 0.645 0.657
Bagging False-Positive 0.171 0.403 0.312
Bagging True-Positive 0.941 0.671 0.643
AdaBoostM1 AUC 0.937 0.698 0.728
AdaBoostM1 F-Measure 0.882 0.645 0.657
AdaBoostM1 False-Positive 0.163 0.403 0.312
AdaBoostM1 True-Positive 0.941 0.671 0.643
Rotation-Forest AUC 0.948 0.79 0.778
Rotation-Forest F-Measure 0.897 0.719 0.696
Rotation-Forest False-Positive 0.158 0.336 0.275
Rotation-Forest True-Positive 0.941 0.75 0.681
Random-Forest AUC 0.933 0.706 0.716
Random-Forest F-Measure 0.858 0.613 0.663
Random-Forest False-Positive 0.14 0.278 0.369
Random-Forest True-Positive 0.857 0.565 0.679

P@K

average users’ precision@k

Information gain

Applications Installation and Removal Analysis

Application DAta

  •  Hashed User Id

  • Installed Application Number - the number of installed Facebook applications on the user's Facebook account,

  • Date - the date when the information was collected.

     

T-Test

Null hypothesis:

 

Two Sample t-test:

  • Add-on Users (µ = 0.236; stdev= 0.12)
  • Regular Users (µ  = -0.19; stdev= 0.05)

 

T-test Results:

 

  • (t = 25.936; p-value < 2.2e-16)

 

\overline{Regular User} = \overline{Addon Users}
\overline{Regular User} \neq \overline{Addon Users}
AppChangeRatio(u,d):=\frac{AppNum(u,0) \text{-} AppNum(u,d)}{AppNum(u,0)}

Regular Users

Addon Users

ApplicationChangePercent = 0.006Days + 0.05
R^2 = 0.736; p-value = 2.2e-16
ApplicationChangePercent = -0.002Days-0.125
R^2 = 0.57; p-value = 1.351e-12

Labeling Data is Hard

Unsupervised Anomaly Detection in Graphs Utilizing a Link Prediction Algorithm

Malicious Users Tend to Connect to Other Profiles Randomly

Topology Based

Feature Extraction

16 feautres

for directed

graphs

8 feautres for

undirected

graphs

◦ For undirected graphs:

  • Common Friends
  • Total Friends 
  • Jaccard’s-Coefficent 

 

     

    \frac{|\Gamma(v) \cap \Gamma(u)|}{|\Gamma(v) \cup \Gamma(u)|}
    |\Gamma(v) \cup \Gamma(u)|
    |\Gamma(v) \cap \Gamma(u)|
    |\Gamma(v)_{in}| \cap |\Gamma_{out}(u)|
    \begin{cases} 1, & \text{if}\ (u,v)\in E \\ 0, & \text{otherwise} \end{cases}

      ◦ For directed graphs:

    • Transitive Friends
    • Opposite Direction Friends

    Link Classification

    Aggregation of The Results

    \sum_{}

    Meta Feature Exteraction

    AbnormalityVertexProbability(v) := \frac{1}{|\Gamma(v)|}\sum\nolimits_{u \in \Gamma(v)}p(v,u)

    We extracted 7 features​

    •                 - the confidence that an edge is fake.   
    •  
    p(v,u)

    Meta Feature Exteraction

    EdgesProbabilitySTDV(v) := \sigma(EP(V))
    SumEdgeLabel(v) := \sum\nolimits_{u \in \Gamma(v)} EdgeLabel(v,u)
    MeanPredictedLinkLabel(v) := \frac{1}{|\Gamma(v)|}\sum\nolimits_{u \in \Gamma(v)} EdgeLabel(v,u)
    PredictedLabelSTDV(v) := \sigma(\lbrace EdgeLabel(v,u) | u \in \Gamma(v), u,v \in V \rbrace)
    EdgesProbabilityMedian(v) := median(EP(V))
    EdgeCount(v) := |\Gamma(v)|

    outline

    image/svg+xml 1 2 3 4 5 6 7 9 8 10 11 0.94
    image/svg+xml 1 2 3 4 5 6 7 9 8 10 11 0.94 0.84 0.56 0.12 0.22 0.44 0.16 0.32 0.91 0.72 0.59 0.14 0.23
    image/svg+xml 1 2 3 4 5 6 7 9 8 10 11 0.91 0.84 0.76 0.56 0.36 0.16 0.34 0.12 0.32 0.44 0.31
    image/svg+xml 1 2 3 4 5 6 7 9 8 10 11 0.91 0.84 0.76 0.56 0.36 0.16 0.34 0.12 0.32 0.44 0.31

    Datasets

    Network Is Directed Vertices Number Links Number Date Labeled
    Academia Yes 200,169 1,389,063 2011 No
    Anybeat Yes 12,645 67,053 2011 No
    ArXiv HEP-PH No 34,546 421,578 2003 No
    CLASS OF 1880/81 Yes 53 179 1881 Yes
    DBLP No 1,665,850 13,504,952 2016 No
    Google+ Yes 107,614 13,673,453 2012 No
    Orkut No 3,072,441 117,185,083 2012 No
    Twitter Yes 5,384,160 16,011,443 2012 Yes
    Xing No 1,053,754 2,161,968 2012 No
    Yelp No 249,443 3,563,818 2016 No

    Fully Simulated  Networks

    AUC TPR FPR Precision
    Simulation 1 (Arxiv HEP-PH) 0.991 0.889 0.011 0.904
    Simulation 2 (DBLP) 0.997 0.994 0.064 0.993
    Simulation 3 (Yelp) 0.993 0.917 0.007 0.937

    Semi Simulated Networks

    AUC TPR FPR Precision
    Academia 0.999 0.998 0.000 0.997
    Anybeat 1.000 0.996 0.001 0.996
    Arxiv HEP-PH 0.997 0.953 0.004 0.965
    DBLP 0.997 0.940 0.005 0.995
    Flixster 0.992 0.990 0.092 0.990
    Google+ 1.000 0.999 0.000 0.999
    Xing 0.999 0.955 0.005 0.951
    Yelp 0.996 0.941 0.005 0.958

    Real World Networks

    Kids Friendship Network

    AUC - 0.93
    TPR - 0.91
    FPR- 0.15

    Twitter

    Information gain

    https://github.com/Kagandi/anomalous-vertices-detection

    Publications

    •  Michael Fire, Dima Kagan, Aviad Elishar, and Yuval Elovici, “Social Privacy Protector - Protecting Users’ Privacy in Social Networks”, The Second International Conference on Social Eco-Informatics (SOTICS), Venice, Italy, October, 2012 (Acceptance Rate: 28%).
    • Dima Kagan, Michael Fire, Aviad Elishar, and Yuval Elovici, “Facebook Applications’ Installation and Removal: A Temporal Analysis”, The Third International Conference on Social Eco-Informatics (SOTICS), Lisbon, Portugal, October, 2013 (Acceptance Rate: 29%).
    •  Michael Fire, Dima Kagan, Aviad Elishar, and Yuval Elovici, “Friends or Foe? Fake Profile Identification in Online Social Networks”, Journal of Social Network Analysis and Mining (SNAM), Volume 4, 2014”.

    Publications

    •  Michael Fire, Dima Kagan, Aviad Elishar, and Yuval Elovici, “Fake profile identification: Making social networks safer (Poster)”, WRF Perfect Pitch Session, 2016 (winner of the 2016 Best Commercialization/Translation Potential prize).
    • Dima Kagan, Michael Fire, and Yuval Elovici, "Finding a needle in a haystack: detecting outliers in complex networks", NetSci-X, January 2017.
    •  According to the study presented  we submitted the following patent request. Michael Fire, Dima Kagan, Aviad Elishar, and Yuval Elovici, Method for Protecting User Privacy in Social Networks” (pending patent registration no. 13/688,276).

    Questions?

    Thesis long

    By Dima Kagan

    Thesis long

    • 176