Introduction to Social Network Analysis


Benjamin Lind

Social Network Analysis: Internet Research
St. Petersburg, 15 August, 2013


Brief Overview and Historical Background

(Freeman 2004)

Perspective

Chief Assumption
Relationships between interacting social units matter

Additional Assumptions
  • Interdependence among actors and their actions
  • Relationships between actors allow resource flows
  • Network structure offers individuals opportunities & constraints
  • Structure emerges from patterned relationships between actors

(Wasserman and Faust 1994:4)

Features of Contemporary Social Network Analysis

  1. Intuition of social structure as ties bonding social actors
  2. Informed by systematic empirical data
  3. Visualization plays a substantial role
  4. Requires mathematical and/or computational models


Fields  that develop and apply social network analysis

Anthropology, business fields, communications, computer science, ecology, economics, epidemiology, ethologyhistory, informatics, mathematics, physics, political science, psychology, sociology, statistics

(Freeman 2004:3, 5)

Historical Overview

(Freeman 2004)
  1. Prehistory
  2. Birth
    1. Moreno & Sociometry (1930s)
    2. Harvard
  3. Dark Ages (1940s-1960s)
  4. Harvard Renaissance
  5. Organizational integration

Further Developments

  • Network science
  • Social media
  • Big data

What is a social network?

"A finite set or sets of actors and the relation or relations defined on them" (W&F 1994:20)

What are actors?

Actors are social entities
Actors do not necessarily have the ability to act
Actors (typically) are all of the same type


Formal terms for actors

  • Vertex
  • Node


Examples?

Actors may also have attributes (e.g., age, sex, ethnicity)

What are relations?

Social ties link pairs of actors
Relations collect a specific set of ties among group members


Related formal terms
  • Edges
  • Arcs

What are relations?

Conceptual considerations
  • Directed undirected?
  • Weighted or unweighted?
    • Nominal, ordinal, interval, or ratio scale?
  • Signed or unsigned?
  • Loops?
  • Time sensitivity?
    • Static
    • Moving window
    • Real-time
    • Accumulation and decay

    Relations may also have attributes

    Two Basic Measurements

    Degree
    ...

    Density
    ...

    Two Basic Measurements

    Degree
    Number of edges incident upon a node
    • Undirected
    • Directed
      • Indegree
      • Outdegree
      • Total (Freeman) Degree


    Density
    ...

    Two Basic Measurements

    Degree
    ...

    Density
    Proportion of observed edges, e, in a graph of n actors
    • Undirected
      • Without loops: e / ((n * (n - 1)) / 2)
      • With loops: e / ((n^2)/2)
    • Directed
      • Without loops: e / (n * (n - 1))
      • With loops: e / (n^2)

    What are some different types of networks?

    What are some different types of networks?

    • Simple graph

    What are some different types of networks?

    • Simple graph
    • Multigraph

    What are some different types of networks?

    • Simple graph
    • Multigraph
    • Hypergraph










    What are some different types of networks?

    • Simple graph
    • Multigraph
    • Hypergraph
    • Directed Acyclic Graph

    What are some different types of networks?

    • Simple graph
    • Multigraph
    • Hypergraph
    • Directed acyclic graph
    • Two-Mode Network

    What are some different types of networks?

    • Simple graph
    • Multigraph
    • Hypergraph
    • Directed acyclic graph
    • Two-mode network
    • Ego Networks

    How can we express a social network?

    How can we express a social network?

    • Matrix

    How can we express a social network?

    • Matrix
    • Edgelist


    How can we express a social network?

    • Matrix
    • Edgelist
    • Set Notation
    ℕ = {n1,n2,n3,n4,n5}
    𝕃 = {l1,l2,l3,l4}
    l1 = (n1,n3)
    l2 = (n1,n5)
    l3 = (n2,n4)
    l4 = (n3,n5)
    𝔾 = (,𝕃)

    How can we express a social network?

    • Matrix
    • Edgelist
    • Set notation
    • Sociogram


    Subgraphs

    A set of nodes and edges within a graph

    • Node-generated subgraphs
    • Edge-generated subgraphs

    Network Motif

    "recurring, significant patterns of interaction" - Milo et al. (2002:824)

    "Significance" must be inferred through random graph comparisons
    (i.e., CUG tests)

    Best known motifs

    • Dyads
    • Triads

    Dyad Census


    Dyad Census & Graph Properties

    Undirected
    • Density (i.e., tie probability)
    Directed
    • Density (i.e., tie probability)
    • Reciprocity
    "You should attend funerals, because if you don't go to people's funerals, they won't go to yours."

    Dyad Census & Graph Properties

    Directed
    • Density (i.e., tie probability)
    • Reciprocity
      • Conceptual questions
        • Are null ties reciprocal?
        • Defined by edges or dyads?
      • Common measurements
        • Edgewise
          • 2*M / (2*M + A)
        • Dyadic
          • (M + N) / (M + A + N)
        • Dyadic, non-null ("ratio")
          • M / (MA)

    Triad Census, Undirected

    • Brokerage
      • Characterized by only two ties among three actors
    • Transitivity, "clustering," triadic closure
      • Your friends are often friends with each other
      • Typically measured by weak criterion
        • (3*Triangles) / (Connected Triples)

    Triad Census, Directed


    Triad Census, Directed

    • Brokerage
      • i → j k, i ↛ kk ↛ i 
    • Transitivity
      • Weak (most common)
        • i → j → k, if i → k
      • Strong 
        • i → j → k, iff i → k
    • Cycles
      • i → j → k → i

    Walks

    "A walk is a sequence of nodes and lines, starting and ending with nodes, in which each node is incident with the lines following and proceeding it in the sequence." - Wasserman and Faust (1994:105)

    Walks

    Walks


    Walks


    Trail

    A walk such that every edge traversed is unique
    (yet not necessarily every node)


    Path

    A trail such that every vertex traversed is distinct




    There could be zero, one, or multiple walks, trails, and paths between any two vertices!

    Seven Bridges of Königsberg


    Problem: Walk must cross every bridge only once
    Euler (1735) proved there is no solution for the walk
    • Land masses are nodes, bridges are edges
    • Would need zero or two nodes of odd degree

    Measurements of Distance

    Pairwise
    Path length
    Number of edges traversed between two nodes
    Geodesic
    Shortest path between two nodes
    Geodesic distance
    Length of the shortest path between two nodes


    Graph and Subgraph 
    Average path length
    Mean geodesic distance
    Diameter: Longest  geodesic distance

    Application: Erdös Numbers

    A measurement of collaborative distance

    Application: 6 Degrees of Bacon

    Measurement of geodesic distance
    Bacon Number | # of Actors (van der Hofstad, 13 May 2013:8)
    • 0 | 1
    • 1 | 1902
    • 2 | 160463
    • 3 | 457231
    • 4 | 111310
    • 5 | 8168
    • 6 | 810
    • 7 | 81
    • 8 | 14

    Cycles

    A walk "that begins and ends at the same node" and has "at least three nodes in which all lines are distinct, and all nodes except the beginning and ending node are distinct." (Wasserman and Faust 1994:107-8)

    Cycles have a length


    Connectivity and Components

    If a path exists between each pair of vertices in a graph, then the graph is connected

    • Strong connectivity: preserves path directionality
    • Weak connectivity: ignores path directionality


    A component is a maximally connected subgraph

    An isolate is the smallest possible component: a single vertex without any ties to other vertexes in the graph

    Connectivity and Components

    How many components?

    Connectivity and Components

    A bridge is an edge that, if removed, creates more components
    A cutpoint is a node that, if removed, creates more components

    Centrality and Centralization

    Centrality: Nodal measurement
    Who are the most important actors in a network?

    Centralization: Graph measurement
    How much difference in "importance" is there between actors within a network?
    Generally, compares the observed network's centralization against the theoretical maximum

    Centrality and Centralization

    The Big Lebowski
    Character co-appearances

    Centrality and Centralization


    1. Degree
    2. Betweenness
    3. Closeness
    4. Eigenvector

      (Freeman 1979; Bonacich 1987)

      Cumulative Degree Distribution

      Cumulative Degree Distribution

      Preferential Attachment
      • Cumulative Advantage
      • Matthew Effect (Merton)
      "For everyone who has will be given more, and he will have an abundance. Whoever does not have, even what he has will be taken from him." (Matthew 25:29)
      • Friendship Paradox (Feld 1991)

      P(X=x) ~ x^(-alpha)
      Nodes are of degree greater than or equal to x
      P(X=x) is the probability of observing a node with degree x or greater
      alpha is the scalar
      (Barabási and Albert 1999)

      Betweenness

      How many geodesics go through a node (or edge)?







      Variations
      Edge weighted
      Edge betweenness
      Proximity, Scale Long Paths, and Cutoff
      Endpoints
      Random walk 

      Closeness

      Q: What is closeness?
      A: The inverse of farness!
      Q: What is farness?
      If connected, the sum of a node's geodesic distances to all other nodes
      Variations:
      Unconnected graphs
      Edge weighted
      Random walk
      Ex. Kevin Bacon
      1049th closest actor (of ~800k)
      Sean Connery is closer!
      (van der Hofstad 13 May 2013:8)

      Eigenvector Centrality

      Power comes from associating with the powerful
      • Centrality accumulates from the centralities of associated alters
      • Favors large, dense subgraphs (cliques)
      • Equal to the first eigenvector of the network's adjacency matrix


        Aren't all these usually getting at the same thing?

        Often, but not necessarily (Krackhardt 1990)
        Degree: (2 = 3 = 4), (1 = 5 = 6), 7

        Betweenness: 4, 5, 6, (2 = 3), (7, 1)


        Closeness: 4, 5, (2 = 3), 6, 1, 7


        Eigenvector Centrality: (2 = 3), 4, 1, 5, 6, 7 

        Cohesive Subgroups

        “the forces holding the individuals within the groupings in which they are” - Moreno and Jennings (1937:137)
        Cohesive groups tend to
        • Interact relatively frequently
        • Have strong, direct ties within themselves
        • Display high internal density
        • Share attitudes and behaviors within themselves
        • Exert pressure and social norms internally

        Cliques

        A maximally complete subgroup - Luce and Perry (1949)

        ~In other words~

        Everyone has a tie to everyone else in the subgroup (complete)
        No other, smaller subgroups include only a subset of the same actors (maximal)






        Alternatives to Cliques

        • Geodesic-based approaches
          • n-cliques, n-clans, n-clubs
          • Not robust to edge deletion
          • No in-group/out-group distinction
        • Degree-based approaches
          • k-plexes, k-cores
          • No ingroup/outgroup distinction
        • Connectivity-based approaches
          • Lambda sets, Moody & White's (2003) cohesive blocks
          • Nodes not necessarily directly or closely connected
        • Ingroup/outgroup distinctions
          • LS Sets
          • Modularity-based methods

        k-cores

        Cohesive "seedbeds" nested within a network

        Minimum #ties (k) each member of a subgroup has to other subgroup members

        "Coreness" (c
        If a node belongs to a c-core, but not a (c+1)-core

        Directed graphs may measure k-cores through
        • Ties going inward
        • Ties going outward
        • Total ties


        Alvarez-Hamelin et al. (2006); Seidman (1983)

        1-core

        1 and 2-cores

        1, 2, and 3-cores

        1 through 4-cores

        Community Detection

        Goal: Find groups with more ties among members and fewer ties between groups than expected (conditional on degree)
        Key Measurement: Modularity, Q, between -0.5 to 1 (Newman 2006)
        • Hierarchical Algorithms
          • Top-Down
            • Girvan-Newman (Newman & Girvan 2004)
            • Leading Eigenvector* (Newman 2006)
          • Bottom-Up
            • Fast-Greedy* (Clauset et al. 2004)
            • Walktrap (Pons & Latapy 2005)
            • Louvain method*, ** (Blondel et al. 2008)
        • Spin-Glass (Reichardt & Bornholdt 2006; Traag & Bruggeman 2008)
        *Modularity optimized, **Semi-hierarchical
        Choose an algorithm based upon theory, functionality, or highest modularity

        Louvain Method, First Pass

        Louvain Method, Second Pass

        Louvain Method, Both Passes

        Density Comparisons

        Modularity: 0.36, 0.44
        Graph Density: 0.14
        Community Density
        A B C D
        A 0.60 0.28 0.24 0.20
        B 0.28 0.42 0.24 0.20
        C 0.24 0.24 0.47 0.23
        D 0.20 0.20 0.23 0.32

        Major Research Subjects in Brief

        1. Homophily
        2. Diffusion
        3. Modeling Tie Formation

        Homophily

        ("Assortativity")

        Birds of a feather flock together

        Homophily

        Examples?
        Categorical vs. Continuous variables
        Sources?
        Which relationships?
        Felds's Foci
        Forms of homophily
        1. Generalized
        2. Differential (some groups more homophilous than others)
        3. Matching (some groups prefer other groups in addition to themselves)
        Intervening considerations
        1. Population effects
        2. Degree correlated attributes
        3. Triadic closure

        Diffusion

        The spread of a behavior or attribute

        Diffusion

        Requirements

        1. An artifact
        2. A sender
        3. A receiver
        4. A channel

        Diffusion


        Relationship to previous adopter increases a receiving node's propensity to adopt

        Diffusion

        Considerations
        • Account for homophily
        • Theorizing channels and artifacts
        • Conceptualizing time
          • Adoption rate
          • Decay
        • Inhibitors

        Modeling

        How do ties form?

        • Preferential attachment
        • Homophily / assortativity
        • Block models
        • Small world
        • Network evolution models
        • p* / ERGM family

        Blockmodels

        Focus upon positions, not actors

        Comprised of
        1. Discrete subsets of actors into "positions"
        2. Relationships within and between positions


        Potential hypotheses
        1. Relationship between positions and attributes
        2. Structure of relationships




        The following examples from Wasserman and Faust (1994:423)






        Cohesive Subgroups






        Center-periphery





        Centralized




        Hierarchy





        Transitivity

        Small World

         Watts and Strogatz (1998)

        Properties
        1. High clustering
        2. Short path lengths

        Network Evolution Models

        (Toivonen et al. 2009)

        Tie formation follows (usually local) structures

        Two families
        • Growing models
          • Nodes & links added until N nodes reached
        • Dynamical models
          • Adding & removing nodes until equilibrium reached

        p* / ERGM

        Discussed in greater detail later in the course

        Introduction to Social Network Analysis

        By Benjamin Lind

        Introduction to Social Network Analysis

        Presentation serves as an introduction to basic concepts within social network analysis. Presentation will be held on August 15, 2013 in St. Petersburg.

        • 8,190