Introduction to Social Network Analysis
Benjamin Lind
Social Network Analysis: Internet Research
St. Petersburg, 15 August, 2013
Brief Overview and Historical Background
(Freeman 2004)
Perspective
Chief Assumption
Relationships between interacting social units matter
Additional Assumptions
 Interdependence among actors and their actions
 Relationships between actors allow resource flows
 Network structure offers individuals opportunities & constraints
 Structure emerges from patterned relationships between actors
Features of Contemporary Social Network Analysis
 Intuition of social structure as ties bonding social actors
 Informed by systematic empirical data
 Visualization plays a substantial role
 Requires mathematical and/or computational models
Fields that develop and apply social network analysis
Anthropology, business fields, communications, computer science, ecology, economics, epidemiology, ethology, history, informatics, mathematics, physics, political science, psychology, sociology, statistics
(Freeman 2004:3, 5)
Historical Overview
(Freeman 2004)
 Prehistory
 Birth
 Moreno & Sociometry (1930s)
 Harvard
 Dark Ages (1940s1960s)
 Harvard Renaissance
 Organizational integration
Further Developments
 Network science
 Social media
 Big data
What is a social network?
"A finite set or sets of actors and the relation or relations defined on them" (W&F 1994:20)
What are actors?
Actors are social entities
Actors do not necessarily have the ability to act
Actors (typically) are all of the same type
Formal terms for actors
 Vertex
 Node
Examples?
Actors may also have attributes (e.g., age, sex, ethnicity)
What are relations?
Social ties link pairs of actors
Relations collect a specific set of ties among group members
Related formal terms
 Edges
 Arcs
What are relations?
Conceptual considerations
 Directed undirected?
 Weighted or unweighted?
 Nominal, ordinal, interval, or ratio scale?
 Signed or unsigned?
 Loops?
 Time sensitivity?
 Static
 Moving window
 Realtime
 Accumulation and decay
Relations may also have attributes
Two Basic Measurements
Degree
...
Density
...
Two Basic Measurements
Degree
Number of edges incident upon a node

Undirected

Directed
 Indegree
 Outdegree
 Total (Freeman) Degree
Density
...
Two Basic Measurements
Degree
...
Density
Proportion of observed edges, e, in a graph of n actors
 Undirected
 Without loops: e / ((n * (n  1)) / 2)
 With loops: e / ((n^2)/2)
 Directed
 Without loops: e / (n * (n  1))
 With loops: e / (n^2)
What are some different types of networks?
What are some different types of networks?

Simple graph
What are some different types of networks?
 Simple graph
 Multigraph
What are some different types of networks?
 Simple graph
 Multigraph
 Hypergraph
What are some different types of networks?
 Simple graph
 Multigraph
 Hypergraph
 Directed Acyclic Graph
What are some different types of networks?
 Simple graph
 Multigraph
 Hypergraph
 Directed acyclic graph
 TwoMode Network
What are some different types of networks?
 Simple graph
 Multigraph
 Hypergraph
 Directed acyclic graph
 Twomode network
 Ego Networks
How can we express a social network?
How can we express a social network?
 Matrix
How can we express a social network?
 Matrix
 Edgelist
How can we express a social network?
 Matrix
 Edgelist
 Set Notation
ℕ = {n1,n2,n3,n4,n5}𝕃 = {l1,l2,l3,l4}
l1 = (n1,n3)l2 = (n1,n5)l3 = (n2,n4)
l4 = (n3,n5)
𝔾 = (ℕ,𝕃)
How can we express a social network?
 Matrix
 Edgelist
 Set notation
 Sociogram
Subgraphs
A set of nodes and edges within a graph
 Nodegenerated subgraphs
 Edgegenerated subgraphs
Network Motif
"recurring, significant patterns of interaction"  Milo et al. (2002:824)
"Significance" must be inferred through random graph comparisons
(i.e., CUG tests)
Best known motifs
 Dyads
 Triads
Dyad Census
Dyad Census & Graph Properties
Undirected

Density (i.e., tie probability)
Directed
 Density (i.e., tie probability)
 Reciprocity
"You should attend funerals, because if you don't go to people's funerals, they won't go to yours."
Dyad Census & Graph Properties
Directed
 Density (i.e., tie probability)

Reciprocity
 Conceptual questions
 Are null ties reciprocal?
 Defined by edges or dyads?
 Common measurements
 Edgewise
 2*M / (2*M + A)
 Dyadic
 (M + N) / (M + A + N)
 Dyadic, nonnull ("ratio")
 M / (M + A)
Triad Census, Undirected

Brokerage
 Characterized by only two ties among three actors
 Transitivity, "clustering," triadic closure
 Your friends are often friends with each other
 Typically measured by weak criterion
 (3*Triangles) / (Connected Triples)
Triad Census, Directed
Triad Census, Directed
 Brokerage
 i → j → k, i ↛ k, k ↛ i
 Transitivity
 Weak (most common)

i → j → k, if i → k
 Strong
 i → j → k, iff i → k
 Cycles

i → j → k → i
Walks
"A walk is a sequence of nodes and lines, starting and ending with nodes, in which each node is incident with the lines following and proceeding it in the sequence."  Wasserman and Faust (1994:105)
Walks
Walks
Walks
Trail
A walk such that every edge traversed is unique
(yet not necessarily every node)
Path
A trail such that every vertex traversed is distinct
There could be zero, one, or multiple walks, trails, and paths between any two vertices!
Seven Bridges of Königsberg
Problem: Walk must cross every bridge only once
Euler (1735) proved there is no solution for the walk

Land masses are nodes, bridges are edges
 Would need zero or two nodes of odd degree
Measurements of Distance
Pairwise
Path length
Number of edges traversed between two nodes
Geodesic
Shortest path between two nodes
Geodesic distance
Length of the shortest path between two nodes
Graph and Subgraph
Average path length
Mean geodesic distance
Diameter: Longest geodesic distance
Application: Erdös Numbers
A measurement of collaborative distance
Application: 6 Degrees of Bacon
Measurement of geodesic distance
Bacon Number  # of Actors (van der Hofstad, 13 May 2013:8)

0  1

1  1902

2  160463

3  457231

4  111310

5  8168

6  810

7  81

8  14
Cycles
A walk "that begins and ends at the same node" and has "at least three nodes in which all lines are distinct, and all nodes except the beginning and ending node are distinct." (Wasserman and Faust 1994:1078)
Cycles have a length
Connectivity and Components
If a path exists between each pair of vertices in a graph, then the graph is connected
 Strong connectivity: preserves path directionality
 Weak connectivity: ignores path directionality
A component is a maximally connected subgraph
An isolate is the smallest possible component: a single vertex without any ties to other vertexes in the graph
Connectivity and Components
How many components?
Connectivity and Components
A bridge is an edge that, if removed, creates more components
A cutpoint is a node that, if removed, creates more components
Centrality and Centralization
Centrality: Nodal measurement
Who are the most important actors in a network?
Centralization: Graph measurement
How much difference in "importance" is there between actors within a network?
Generally, compares the observed network's centralization against the theoretical maximum
Centrality and Centralization
The Big Lebowski
Character coappearances
Centrality and Centralization
 Degree
 Betweenness
 Closeness
 Eigenvector
(Freeman 1979; Bonacich 1987)
Cumulative Degree Distribution
Cumulative Degree Distribution
Preferential Attachment

Cumulative Advantage

Matthew Effect (Merton)
"For everyone who has will be given more, and he will have an abundance. Whoever does not have, even what he has will be taken from him." (Matthew 25:29)

Friendship Paradox (Feld 1991)
P(X=x) ~ x^(alpha)
Nodes are of degree greater than or equal to x
P(X=x) is the probability of observing a node with degree x or greater
alpha is the scalar
(Barabási and Albert 1999)
Betweenness
How many geodesics go through a node (or edge)?
Variations
Edge weighted
Edge betweenness
Proximity, Scale Long Paths, and Cutoff
Endpoints
Random walk
Closeness
Q: What is closeness?
A: The inverse of farness!
Q: What is farness?
If connected, the sum of a node's geodesic distances to all other nodes
Variations:
Unconnected graphs
Edge weighted
Random walk
Ex. Kevin Bacon
1049th closest actor (of ~800k)
Sean Connery is closer!
(van der Hofstad 13 May 2013:8)
Eigenvector Centrality
Power comes from associating with the powerful
 Centrality accumulates from the centralities of associated alters
 Favors large, dense subgraphs (cliques)
 Equal to the first eigenvector of the network's adjacency matrix
Aren't all these usually getting at the same thing?
Often, but not necessarily (Krackhardt 1990)
Degree: (2 = 3 = 4), (1 = 5 = 6), 7
Betweenness: 4, 5, 6, (2 = 3), (7, 1)
Closeness: 4, 5, (2 = 3), 6, 1, 7
Eigenvector Centrality: (2 = 3), 4, 1, 5, 6, 7
Cohesive Subgroups
“the forces holding the individuals within the groupings in which they are”  Moreno and Jennings (1937:137)
Cohesive groups tend to
 Interact relatively frequently
 Have strong, direct ties within themselves
 Display high internal density
 Share attitudes and behaviors within themselves
 Exert pressure and social norms internally
Cliques
A maximally complete subgroup  Luce and Perry (1949)
~In other words~
Everyone has a tie to everyone else in the subgroup (complete)
No other, smaller subgroups include only a subset of the same actors (maximal)
Alternatives to Cliques
 Geodesicbased approaches
 ncliques, nclans, nclubs
 Not robust to edge deletion

No ingroup/outgroup distinction
 Degreebased approaches
 kplexes, kcores
 No ingroup/outgroup distinction
 Connectivitybased approaches
 Lambda sets, Moody & White's (2003) cohesive blocks
 Nodes not necessarily directly or closely connected
 Ingroup/outgroup distinctions
 LS Sets
 Modularitybased methods
kcores
Cohesive "seedbeds" nested within a network
Minimum #ties (k) each member of a subgroup has to other subgroup members
"Coreness" (c)
If a node belongs to a ccore, but not a (c+1)core
Directed graphs may measure kcores through
 Ties going inward
 Ties going outward
 Total ties
AlvarezHamelin et al. (2006); Seidman (1983)
1core
1 and 2cores
1, 2, and 3cores
1 through 4cores
Community Detection
Goal: Find groups with more ties among members and fewer ties between groups than expected (conditional on degree)
Key Measurement: Modularity, Q, between 0.5 to 1 (Newman 2006)
 Hierarchical Algorithms
 TopDown
 GirvanNewman (Newman & Girvan 2004)
 Leading Eigenvector* (Newman 2006)
 BottomUp
 FastGreedy* (Clauset et al. 2004)
 Walktrap (Pons & Latapy 2005)
 Louvain method*, ** (Blondel et al. 2008)
 SpinGlass (Reichardt & Bornholdt 2006; Traag & Bruggeman 2008)
*Modularity optimized, **Semihierarchical
Choose an algorithm based upon theory, functionality, or highest modularity
Louvain Method, First Pass
Louvain Method, Second Pass
Louvain Method, Both Passes
Density Comparisons
Modularity: 0.36, 0.44
Graph Density: 0.14
Community Density  
A  B  C  D  
A  0.60  0.28  0.24  0.20 
B  0.28  0.42  0.24  0.20 
C  0.24  0.24  0.47  0.23 
D  0.20  0.20  0.23  0.32 
Major Research Subjects in Brief
 Homophily
 Diffusion
 Modeling Tie Formation
Homophily
("Assortativity")
Birds of a feather flock together
Homophily
Examples?
Categorical vs. Continuous variables
Sources?
Which relationships?
Felds's Foci
Forms of homophily
 Generalized
 Differential (some groups more homophilous than others)
 Matching (some groups prefer other groups in addition to themselves)
Intervening considerations
 Population effects
 Degree correlated attributes

Triadic closure
Diffusion
The spread of a behavior or attribute
Diffusion
Requirements
 An artifact
 A sender
 A receiver
 A channel
Diffusion
Relationship to previous adopter increases a receiving node's propensity to adopt
Diffusion
Considerations
 Account for homophily
 Theorizing channels and artifacts
 Conceptualizing time
 Adoption rate
 Decay
 Inhibitors
Modeling
How do ties form?
 Preferential attachment
 Homophily / assortativity

Block models
 Small world

Network evolution models
 p* / ERGM family
Blockmodels
Focus upon positions, not actors
Comprised of
 Discrete subsets of actors into "positions"
 Relationships within and between positions
Potential hypotheses
 Relationship between positions and attributes
 Structure of relationships
The following examples from Wasserman and Faust (1994:423)
Cohesive Subgroups
Centerperiphery
Centralized
Hierarchy
Transitivity
Small World
Watts and Strogatz (1998)
Properties
 High clustering
 Short path lengths
Network Evolution Models
(Toivonen et al. 2009)
Tie formation follows (usually local) structures
Two families
 Growing models
 Nodes & links added until N nodes reached
 Dynamical models
 Adding & removing nodes until equilibrium reached
p* / ERGM
Discussed in greater detail later in the course
Introduction to Social Network Analysis
By Benjamin Lind
Introduction to Social Network Analysis
Presentation serves as an introduction to basic concepts within social network analysis. Presentation will be held on August 15, 2013 in St. Petersburg.
 6,735
Loading comments...