Introduction to Social Network Analysis
Benjamin Lind
Social Network Analysis: Internet Research
St. Petersburg, 15 August, 2013
Brief Overview and Historical Background
(Freeman 2004)
Perspective
Chief Assumption
Relationships between interacting social units matter
Additional Assumptions
- Interdependence among actors and their actions
- Relationships between actors allow resource flows
- Network structure offers individuals opportunities & constraints
- Structure emerges from patterned relationships between actors
Features of Contemporary Social Network Analysis
- Intuition of social structure as ties bonding social actors
- Informed by systematic empirical data
- Visualization plays a substantial role
- Requires mathematical and/or computational models
Fields that develop and apply social network analysis
Anthropology, business fields, communications, computer science, ecology, economics, epidemiology, ethology, history, informatics, mathematics, physics, political science, psychology, sociology, statistics
(Freeman 2004:3, 5)
Historical Overview
(Freeman 2004)
- Prehistory
- Birth
- Moreno & Sociometry (1930s)
- Harvard
- Dark Ages (1940s-1960s)
- Harvard Renaissance
- Organizational integration
Further Developments
- Network science
- Social media
- Big data
What is a social network?
"A finite set or sets of actors and the relation or relations defined on them" (W&F 1994:20)
What are actors?
Actors are social entities
Actors do not necessarily have the ability to act
Actors (typically) are all of the same type
Formal terms for actors
- Vertex
- Node
Examples?
Actors may also have attributes (e.g., age, sex, ethnicity)
What are relations?
Social ties link pairs of actors
Relations collect a specific set of ties among group members
Related formal terms
- Edges
- Arcs
What are relations?
Conceptual considerations
- Directed undirected?
- Weighted or unweighted?
- Nominal, ordinal, interval, or ratio scale?
- Signed or unsigned?
- Loops?
- Time sensitivity?
- Static
- Moving window
- Real-time
- Accumulation and decay
Relations may also have attributes
Two Basic Measurements
Degree
...
Density
...
Two Basic Measurements
Degree
Number of edges incident upon a node
-
Undirected
-
Directed
- Indegree
- Outdegree
- Total (Freeman) Degree
Density
...
Two Basic Measurements
Degree
...
Density
Proportion of observed edges, e, in a graph of n actors
- Undirected
- Without loops: e / ((n * (n - 1)) / 2)
- With loops: e / ((n^2)/2)
- Directed
- Without loops: e / (n * (n - 1))
- With loops: e / (n^2)
What are some different types of networks?
What are some different types of networks?
-
Simple graph
What are some different types of networks?
- Simple graph
- Multigraph
What are some different types of networks?
- Simple graph
- Multigraph
- Hypergraph
What are some different types of networks?
- Simple graph
- Multigraph
- Hypergraph
- Directed Acyclic Graph
What are some different types of networks?
- Simple graph
- Multigraph
- Hypergraph
- Directed acyclic graph
- Two-Mode Network
What are some different types of networks?
- Simple graph
- Multigraph
- Hypergraph
- Directed acyclic graph
- Two-mode network
- Ego Networks
How can we express a social network?
How can we express a social network?
- Matrix
How can we express a social network?
- Matrix
- Edgelist
How can we express a social network?
- Matrix
- Edgelist
- Set Notation
ℕ = {n1,n2,n3,n4,n5}𝕃 = {l1,l2,l3,l4}
l1 = (n1,n3)l2 = (n1,n5)l3 = (n2,n4)
l4 = (n3,n5)
𝔾 = (ℕ,𝕃)
How can we express a social network?
- Matrix
- Edgelist
- Set notation
- Sociogram
Subgraphs
A set of nodes and edges within a graph
- Node-generated subgraphs
- Edge-generated subgraphs
Network Motif
"recurring, significant patterns of interaction" - Milo et al. (2002:824)
"Significance" must be inferred through random graph comparisons
(i.e., CUG tests)
Best known motifs
- Dyads
- Triads
Dyad Census
Dyad Census & Graph Properties
Undirected
-
Density (i.e., tie probability)
Directed
- Density (i.e., tie probability)
- Reciprocity
"You should attend funerals, because if you don't go to people's funerals, they won't go to yours."
Dyad Census & Graph Properties
Directed
- Density (i.e., tie probability)
-
Reciprocity
- Conceptual questions
- Are null ties reciprocal?
- Defined by edges or dyads?
- Common measurements
- Edgewise
- 2*M / (2*M + A)
- Dyadic
- (M + N) / (M + A + N)
- Dyadic, non-null ("ratio")
- M / (M + A)
Triad Census, Undirected
-
Brokerage
- Characterized by only two ties among three actors
- Transitivity, "clustering," triadic closure
- Your friends are often friends with each other
- Typically measured by weak criterion
- (3*Triangles) / (Connected Triples)
Triad Census, Directed
Triad Census, Directed
- Brokerage
- i → j → k, i ↛ k, k ↛ i
- Transitivity
- Weak (most common)
-
i → j → k, if i → k
- Strong
- i → j → k, iff i → k
- Cycles
-
i → j → k → i
Walks
"A walk is a sequence of nodes and lines, starting and ending with nodes, in which each node is incident with the lines following and proceeding it in the sequence." - Wasserman and Faust (1994:105)
Walks
Walks
Walks
Trail
A walk such that every edge traversed is unique
(yet not necessarily every node)
Path
A trail such that every vertex traversed is distinct
There could be zero, one, or multiple walks, trails, and paths between any two vertices!
Seven Bridges of Königsberg
Problem: Walk must cross every bridge only once
Euler (1735) proved there is no solution for the walk
-
Land masses are nodes, bridges are edges
- Would need zero or two nodes of odd degree
Measurements of Distance
Pairwise
Path length
Number of edges traversed between two nodes
Geodesic
Shortest path between two nodes
Geodesic distance
Length of the shortest path between two nodes
Graph and Subgraph
Average path length
Mean geodesic distance
Diameter: Longest geodesic distance
Application: Erdös Numbers
A measurement of collaborative distance
Application: 6 Degrees of Bacon
Measurement of geodesic distance
Bacon Number | # of Actors (van der Hofstad, 13 May 2013:8)
-
0 | 1
-
1 | 1902
-
2 | 160463
-
3 | 457231
-
4 | 111310
-
5 | 8168
-
6 | 810
-
7 | 81
-
8 | 14
Cycles
A walk "that begins and ends at the same node" and has "at least three nodes in which all lines are distinct, and all nodes except the beginning and ending node are distinct." (Wasserman and Faust 1994:107-8)
Cycles have a length
Connectivity and Components
If a path exists between each pair of vertices in a graph, then the graph is connected
- Strong connectivity: preserves path directionality
- Weak connectivity: ignores path directionality
A component is a maximally connected subgraph
An isolate is the smallest possible component: a single vertex without any ties to other vertexes in the graph
Connectivity and Components
How many components?
Connectivity and Components
A bridge is an edge that, if removed, creates more components
A cutpoint is a node that, if removed, creates more components
Centrality and Centralization
Centrality: Nodal measurement
Who are the most important actors in a network?
Centralization: Graph measurement
How much difference in "importance" is there between actors within a network?
Generally, compares the observed network's centralization against the theoretical maximum
Centrality and Centralization
The Big Lebowski
Character co-appearances
Centrality and Centralization
- Degree
- Betweenness
- Closeness
- Eigenvector
(Freeman 1979; Bonacich 1987)
Cumulative Degree Distribution
Cumulative Degree Distribution
Preferential Attachment
-
Cumulative Advantage
-
Matthew Effect (Merton)
"For everyone who has will be given more, and he will have an abundance. Whoever does not have, even what he has will be taken from him." (Matthew 25:29)
-
Friendship Paradox (Feld 1991)
P(X=x) ~ x^(-alpha)
Nodes are of degree greater than or equal to x
P(X=x) is the probability of observing a node with degree x or greater
alpha is the scalar
(Barabási and Albert 1999)
Betweenness
How many geodesics go through a node (or edge)?
Variations
Edge weighted
Edge betweenness
Proximity, Scale Long Paths, and Cutoff
Endpoints
Random walk
Closeness
Q: What is closeness?
A: The inverse of farness!
Q: What is farness?
If connected, the sum of a node's geodesic distances to all other nodes
Variations:
Unconnected graphs
Edge weighted
Random walk
Ex. Kevin Bacon
1049th closest actor (of ~800k)
Sean Connery is closer!
(van der Hofstad 13 May 2013:8)
Eigenvector Centrality
Power comes from associating with the powerful
- Centrality accumulates from the centralities of associated alters
- Favors large, dense subgraphs (cliques)
- Equal to the first eigenvector of the network's adjacency matrix
Aren't all these usually getting at the same thing?
Often, but not necessarily (Krackhardt 1990)
Degree: (2 = 3 = 4), (1 = 5 = 6), 7
Betweenness: 4, 5, 6, (2 = 3), (7, 1)
Closeness: 4, 5, (2 = 3), 6, 1, 7
Eigenvector Centrality: (2 = 3), 4, 1, 5, 6, 7
Cohesive Subgroups
“the forces holding the individuals within the groupings in which they are” - Moreno and Jennings (1937:137)
Cohesive groups tend to
- Interact relatively frequently
- Have strong, direct ties within themselves
- Display high internal density
- Share attitudes and behaviors within themselves
- Exert pressure and social norms internally
Cliques
A maximally complete subgroup - Luce and Perry (1949)
~In other words~
Everyone has a tie to everyone else in the subgroup (complete)
No other, smaller subgroups include only a subset of the same actors (maximal)
Alternatives to Cliques
- Geodesic-based approaches
- n-cliques, n-clans, n-clubs
- Not robust to edge deletion
-
No in-group/out-group distinction
- Degree-based approaches
- k-plexes, k-cores
- No ingroup/outgroup distinction
- Connectivity-based approaches
- Lambda sets, Moody & White's (2003) cohesive blocks
- Nodes not necessarily directly or closely connected
- Ingroup/outgroup distinctions
- LS Sets
- Modularity-based methods
k-cores
Cohesive "seedbeds" nested within a network
Minimum #ties (k) each member of a subgroup has to other subgroup members
"Coreness" (c)
If a node belongs to a c-core, but not a (c+1)-core
Directed graphs may measure k-cores through
- Ties going inward
- Ties going outward
- Total ties
Alvarez-Hamelin et al. (2006); Seidman (1983)
1-core
1 and 2-cores
1, 2, and 3-cores
1 through 4-cores
Community Detection
Goal: Find groups with more ties among members and fewer ties between groups than expected (conditional on degree)
Key Measurement: Modularity, Q, between -0.5 to 1 (Newman 2006)
- Hierarchical Algorithms
- Top-Down
- Girvan-Newman (Newman & Girvan 2004)
- Leading Eigenvector* (Newman 2006)
- Bottom-Up
- Fast-Greedy* (Clauset et al. 2004)
- Walktrap (Pons & Latapy 2005)
- Louvain method*, ** (Blondel et al. 2008)
- Spin-Glass (Reichardt & Bornholdt 2006; Traag & Bruggeman 2008)
*Modularity optimized, **Semi-hierarchical
Choose an algorithm based upon theory, functionality, or highest modularity
Louvain Method, First Pass
Louvain Method, Second Pass
Louvain Method, Both Passes
Density Comparisons
Modularity: 0.36, 0.44
Graph Density: 0.14
Community Density | ||||
A | B | C | D | |
A | 0.60 | 0.28 | 0.24 | 0.20 |
B | 0.28 | 0.42 | 0.24 | 0.20 |
C | 0.24 | 0.24 | 0.47 | 0.23 |
D | 0.20 | 0.20 | 0.23 | 0.32 |
Major Research Subjects in Brief
- Homophily
- Diffusion
- Modeling Tie Formation
Homophily
("Assortativity")
Birds of a feather flock together
Homophily
Examples?
Categorical vs. Continuous variables
Sources?
Which relationships?
Felds's Foci
Forms of homophily
- Generalized
- Differential (some groups more homophilous than others)
- Matching (some groups prefer other groups in addition to themselves)
Intervening considerations
- Population effects
- Degree correlated attributes
-
Triadic closure
Diffusion
The spread of a behavior or attribute
Diffusion
Requirements
- An artifact
- A sender
- A receiver
- A channel
Diffusion
Relationship to previous adopter increases a receiving node's propensity to adopt
Diffusion
Considerations
- Account for homophily
- Theorizing channels and artifacts
- Conceptualizing time
- Adoption rate
- Decay
- Inhibitors
Modeling
How do ties form?
- Preferential attachment
- Homophily / assortativity
-
Block models
- Small world
-
Network evolution models
- p* / ERGM family
Blockmodels
Focus upon positions, not actors
Comprised of
- Discrete subsets of actors into "positions"
- Relationships within and between positions
Potential hypotheses
- Relationship between positions and attributes
- Structure of relationships
The following examples from Wasserman and Faust (1994:423)
Cohesive Subgroups
Center-periphery
Centralized
Hierarchy
Transitivity
Small World
Watts and Strogatz (1998)
Properties
- High clustering
- Short path lengths
Network Evolution Models
(Toivonen et al. 2009)
Tie formation follows (usually local) structures
Two families
- Growing models
- Nodes & links added until N nodes reached
- Dynamical models
- Adding & removing nodes until equilibrium reached
p* / ERGM
Discussed in greater detail later in the course
Introduction to Social Network Analysis
By Benjamin Lind
Introduction to Social Network Analysis
Presentation serves as an introduction to basic concepts within social network analysis. Presentation will be held on August 15, 2013 in St. Petersburg.
- 8,140