Introduction to Social Network Analysis

Benjamin Lind

Social Network Analysis: Internet Research

St. Petersburg, 15 August, 2013

Brief Overview and Historical Background

(Freeman 2004)

Perspective

Chief Assumption

Relationships between interacting social units matter

Additional Assumptions

Interdependence among actors and their actions
Relationships between actors allow resource flows
Network structure offers individuals opportunities & constraints
Structure emerges from patterned relationships between actors

(Wasserman and Faust 1994:4)

Features of Contemporary Social Network Analysis

Intuition of social structure as ties bonding social actors
Informed by systematic empirical data
Visualization plays a substantial role
Requires mathematical and/or computational models

Fields that develop and apply social network analysis

Anthropology, business fields, communications, computer science, ecology, economics, epidemiology, ethology, history, informatics, mathematics, physics, political science, psychology, sociology, statistics

(Freeman 2004:3, 5)

Historical Overview

(Freeman 2004)

Prehistory
Birth

Moreno & Sociometry (1930s)
Harvard

Dark Ages (1940s-1960s)
Harvard Renaissance
Organizational integration

Further Developments

Network science
Social media
Big data

What is a social network?

"A finite set or sets of actors and the relation or relations defined on them" (W&F 1994:20)

What are actors?

Actors are social entities

Actors do not necessarily have the ability to act

Actors (typically) are all of the same type

Formal terms for actors

Vertex
Node

Examples?

Actors may also have attributes (e.g., age, sex, ethnicity)

What are relations?

Social ties link pairs of actors

Relations collect a specific set of ties among group members

Related formal terms

Edges
Arcs

What are relations?

Conceptual considerations

Directed undirected?
Weighted or unweighted?

Nominal, ordinal, interval, or ratio scale?

Signed or unsigned?
Loops?
Time sensitivity?

Static
Moving window
Real-time
Accumulation and decay

Relations may also have attributes

Two Basic Measurements

Degree

...

Density

...

Two Basic Measurements

Degree

Number of edges incident upon a node

Undirected
Directed

Indegree
Outdegree
Total (Freeman) Degree

Density

...

Two Basic Measurements

Degree

...

Density

Proportion of observed edges, e, in a graph of n actors

Undirected

Without loops: e / ((n * (n - 1)) / 2)
With loops: e / ((n^2)/2)

Directed

Without loops: e / (n * (n - 1))
With loops: e / (n^2)

What are some different types of networks?

Simple graph

What are some different types of networks?

Simple graph
Multigraph

What are some different types of networks?

Simple graph
Multigraph
Hypergraph

Graph by Kilom691 & Pgdx

What are some different types of networks?

Simple graph
Multigraph
Hypergraph
Directed Acyclic Graph

What are some different types of networks?

Simple graph
Multigraph
Hypergraph
Directed acyclic graph
Two-Mode Network

What are some different types of networks?

Simple graph
Multigraph
Hypergraph
Directed acyclic graph
Two-mode network
Ego Networks

How can we express a social network?

Matrix

How can we express a social network?

Matrix
Edgelist

How can we express a social network?

Matrix
Edgelist
Set Notation

ℕ = {n1,n2,n3,n4,n5}
𝕃 = {l1,l2,l3,l4}

l1 = (n1,n3)
l2 = (n1,n5)

l3 = (n2,n4)


l4 = (n3,n5)


𝔾 = (ℕ,𝕃)

How can we express a social network?

Matrix
Edgelist
Set notation
Sociogram

Subgraphs

A set of nodes and edges within a graph

Node-generated subgraphs
Edge-generated subgraphs

Network Motif

"recurring, significant patterns of interaction" - Milo et al. (2002:824)

"Significance" must be inferred through random graph comparisons

(i.e., CUG tests)

Best known motifs

Dyads
Triads

Dyad Census

Dyad Census & Graph Properties

Undirected

Density (i.e., tie probability)

Directed

Density (i.e., tie probability)
Reciprocity

"You should attend funerals, because if you don't go to people's funerals, they won't go to yours."

Dyad Census & Graph Properties

Directed

Density (i.e., tie probability)
Reciprocity

Conceptual questions

Are null ties reciprocal?
Defined by edges or dyads?

Common measurements

Edgewise

2*M / (2*M + A)

Dyadic

(M + N) / (M + A + N)

Dyadic, non-null ("ratio")

M / (M + A)

Triad Census, Undirected

Brokerage

Characterized by only two ties among three actors

Transitivity, "clustering," triadic closure

Your friends are often friends with each other
Typically measured by weak criterion

(3*Triangles) / (Connected Triples)

Triad Census, Directed

Triad Census, Directed

Brokerage

i → j → k, i ↛ k, k ↛ i

Transitivity

Weak (most common)

i → j → k, if i → k

Strong

i → j → k, iff i → k

Cycles

i → j → k → i

Walks

"A walk is a sequence of nodes and lines, starting and ending with nodes, in which each node is incident with the lines following and proceeding it in the sequence." - Wasserman and Faust (1994:105)

Walks

Walks

Trail

A walk such that every edge traversed is unique

(yet not necessarily every node)

Path

A trail such that every vertex traversed is distinct

There could be zero, one, or multiple walks, trails, and paths between any two vertices!

Seven Bridges of Königsberg

Problem: Walk must cross every bridge only once

Euler (1735) proved there is no solution for the walk

Land masses are nodes, bridges are edges
Would need zero or two nodes of odd degree

(Image modified by Bogdan Giuşcă)

Measurements of Distance

Pairwise

Path length

Number of edges traversed between two nodes

Geodesic

Shortest path between two nodes

Geodesic distance

Length of the shortest path between two nodes

Graph and Subgraph

Average path length

Mean geodesic distance

Diameter: Longest geodesic distance

Application: Erdös Numbers

A measurement of collaborative distance

From XKCD

(See also SMBC)

Application: 6 Degrees of Bacon

Measurement of geodesic distance

Bacon Number | # of Actors (van der Hofstad, 13 May 2013:8)

0 | 1
1 | 1902
2 | 160463
3 | 457231
4 | 111310
5 | 8168
6 | 810
7 | 81
8 | 14

Find out an actor's Bacon Number

(Image by SAGIndie)

Cycles

A walk "that begins and ends at the same node" and has "at least three nodes in which all lines are distinct, and all nodes except the beginning and ending node are distinct." (Wasserman and Faust 1994:107-8)

Cycles have a length

Connectivity and Components

If a path exists between each pair of vertices in a graph, then the graph is connected

Strong connectivity: preserves path directionality
Weak connectivity: ignores path directionality

A component is a maximally connected subgraph

An isolate is the smallest possible component: a single vertex without any ties to other vertexes in the graph

Connectivity and Components

How many components?

Connectivity and Components

A bridge is an edge that, if removed, creates more components

A cutpoint is a node that, if removed, creates more components

Centrality and Centralization

Centrality: Nodal measurement

Who are the most important actors in a network?

Centralization: Graph measurement

How much difference in "importance" is there between actors within a network?

Generally, compares the observed network's centralization against the theoretical maximum

Centrality and Centralization

The Big Lebowski

Character co-appearances

Data from MovieGalaxies

Centrality and Centralization

Degree
Betweenness
Closeness
Eigenvector

(Freeman 1979; Bonacich 1987)

Cumulative Degree Distribution

Preferential Attachment

Cumulative Advantage
Matthew Effect (Merton)

"For everyone who has will be given more, and he will have an abundance. Whoever does not have, even what he has will be taken from him." (Matthew 25:29)

Friendship Paradox (Feld 1991)

P(X=x) ~ x^(-alpha)

Nodes are of degree greater than or equal to x

P(X=x) is the probability of observing a node with degree x or greater

alpha is the scalar

(Barabási and Albert 1999)

Betweenness

How many geodesics go through a node (or edge)?

Variations

Edge weighted

Edge betweenness

Proximity, Scale Long Paths, and Cutoff

Endpoints

Random walk

Closeness

Q: What is closeness?

A: The inverse of farness!

Q: What is farness?

If connected, the sum of a node's geodesic distances to all other nodes

Variations:

Unconnected graphs

Edge weighted

Random walk

Ex. Kevin Bacon

1049th closest actor (of ~800k)

Sean Connery is closer!

(van der Hofstad 13 May 2013:8)

Eigenvector Centrality

Power comes from associating with the powerful

Centrality accumulates from the centralities of associated alters
Favors large, dense subgraphs (cliques)
Equal to the first eigenvector of the network's adjacency matrix

Aren't all these usually getting at the same thing?

Often, but not necessarily (Krackhardt 1990)

Degree: (2 = 3 = 4), (1 = 5 = 6), 7

Betweenness: 4, 5, 6, (2 = 3), (7, 1)

Closeness: 4, 5, (2 = 3), 6, 1, 7

Eigenvector Centrality: (2 = 3), 4, 1, 5, 6, 7

Cohesive Subgroups

“the forces holding the individuals within the groupings in which they are” - Moreno and Jennings (1937:137)

Cohesive groups tend to

Interact relatively frequently
Have strong, direct ties within themselves
Display high internal density
Share attitudes and behaviors within themselves
Exert pressure and social norms internally

Cliques

A maximally complete subgroup - Luce and Perry (1949)

~In other words~

Everyone has a tie to everyone else in the subgroup (complete)

No other, smaller subgroups include only a subset of the same actors (maximal)

Alternatives to Cliques

Geodesic-based approaches

n-cliques, n-clans, n-clubs
Not robust to edge deletion
No in-group/out-group distinction

Degree-based approaches

k-plexes, k-cores
No ingroup/outgroup distinction

Connectivity-based approaches

Lambda sets, Moody & White's (2003) cohesive blocks
Nodes not necessarily directly or closely connected

Ingroup/outgroup distinctions

LS Sets
Modularity-based methods

k-cores

Cohesive "seedbeds" nested within a network

Minimum #ties (k) each member of a subgroup has to other subgroup members

"Coreness" (c)

If a node belongs to a c-core, but not a (c+1)-core

Directed graphs may measure k-cores through

Ties going inward
Ties going outward
Total ties

Alvarez-Hamelin et al. (2006); Seidman (1983)

1-core

1 and 2-cores

1, 2, and 3-cores

1 through 4-cores

Community Detection

Goal: Find groups with more ties among members and fewer ties between groups than expected (conditional on degree)

Key Measurement: Modularity, Q, between -0.5 to 1 (Newman 2006)

Hierarchical Algorithms

Top-Down

Girvan-Newman (Newman & Girvan 2004)
Leading Eigenvector* (Newman 2006)

Bottom-Up

Fast-Greedy* (Clauset et al. 2004)
Walktrap (Pons & Latapy 2005)
Louvain method*, ** (Blondel et al. 2008)

Spin-Glass (Reichardt & Bornholdt 2006; Traag & Bruggeman 2008)

*Modularity optimized, **Semi-hierarchical

Choose an algorithm based upon theory, functionality, or highest modularity

Louvain Method, First Pass

Louvain Method, Second Pass

Louvain Method, Both Passes

Density Comparisons

Modularity: 0.36, 0.44

Graph Density: 0.14

	Community Density
	A	B	C	D
A	0.60	0.28	0.24	0.20
B	0.28	0.42	0.24	0.20
C	0.24	0.24	0.47	0.23
D	0.20	0.20	0.23	0.32

Major Research Subjects in Brief

Homophily
Diffusion
Modeling Tie Formation

Homophily

("Assortativity")

Birds of a feather flock together

Homophily

Examples?

Categorical vs. Continuous variables

Sources?

Which relationships?

Felds's Foci

Forms of homophily

Generalized
Differential (some groups more homophilous than others)
Matching (some groups prefer other groups in addition to themselves)

Intervening considerations

Population effects
Degree correlated attributes
Triadic closure

Diffusion

The spread of a behavior or attribute

Diffusion

Requirements

An artifact
A sender
A receiver
A channel

Diffusion

Relationship to previous adopter increases a receiving node's propensity to adopt

Diffusion

Considerations

Account for homophily
Theorizing channels and artifacts
Conceptualizing time

Adoption rate
Decay

Inhibitors

Modeling

How do ties form?

Preferential attachment
Homophily / assortativity
Block models
Small world
Network evolution models
p* / ERGM family

Blockmodels

Focus upon positions, not actors

Comprised of

Discrete subsets of actors into "positions"
Relationships within and between positions

Potential hypotheses

Relationship between positions and attributes
Structure of relationships

The following examples from Wasserman and Faust (1994:423)

Cohesive Subgroups

Center-periphery

Centralized

Hierarchy

Transitivity

Small World

Watts and Strogatz (1998)

Properties

High clustering
Short path lengths

Network Evolution Models

(Toivonen et al. 2009)

Tie formation follows (usually local) structures

Two families

Growing models

Nodes & links added until N nodes reached

Dynamical models

Adding & removing nodes until equilibrium reached

p* / ERGM

Discussed in greater detail later in the course