Social Network Analysis in R

Laboratory Three

Benjamin Lind

Social Network Analysis: Internet Research

St. Petersburg

21 August, 2013 (11:45-13:15)

Before we begin...

Please enter the following code into R to load the previous day's material

library(RCurl)dl.data<-getURL("http://pastebin.com/raw.php?i=ChE800ye", ssl.verifypeer=FALSE)
dl.data<-textConnection(dl.data)
source(dl.data)
rm(dl.data)

Requires a good internet connect

(may take a few minutes)

Graph-Level Indices

What's the edgewise reciprocity for Wikipedia administrator voting?

reciprocity(wikidat)

5.6% of all edges are reciprocated

Is this figure big or small?

Bigger or smaller than what?

We need to establish a basis for comparison.

Graph-Level Indices

Affected by network size and density

Example

graph.order<-seq(from=10, to=100, by=5)
graph.trans<-sapply(graph.order, function(x) replicate(1000, transitivity(erdos.renyi.game(x,.1))))
graph.apl<-sapply(graph.order, function(x) replicate(1000, average.path.length(erdos.renyi.game(x,.1))))par(mfrow=c(1,2))
plot(x=graph.order, y=apply(graph.trans, 2, mean), ylim=c(0,1), xlim=c(0,100), type="l", xlab="Order", ylab="Transitivity", main="Tie Prob=0.10; Simulations=1000")
lines(x=graph.order, y=apply(graph.trans, 2, quantile, probs=.025, na.rm=TRUE), col="red", lty=2, type="l")
lines(x=graph.order, y=apply(graph.trans, 2, quantile, probs=.975, na.rm=TRUE), col="red", lty=2, type="l")
plot(x=graph.order, y=apply(graph.apl, 2, mean), ylim=c(0,5), xlim=c(0,100), type="l", xlab="Order", ylab="Average Path Length", main="Tie Prob=0.10; Simulations=1000")
lines(x=graph.order, y=apply(graph.apl, 2, quantile, probs=.025, na.rm=TRUE), col="red", lty=2, type="l")
lines(x=graph.order, y=apply(graph.apl, 2, quantile, probs=.975, na.rm=TRUE), col="red", lty=2, type="l")

Graph-Level Indices

Affected by network size and density

graph.tieprob<-seq(from=.01, to=.30, by=.01)graph.trans<-sapply(graph.tieprob, function(x) replicate(1000, transitivity(erdos.renyi.game(100,x))))graph.apl<-sapply(graph.tieprob, function(x) replicate(1000, average.path.length(erdos.renyi.game(100, x))))par(mfrow=c(1,2))plot(x=graph.tieprob, y=apply(graph.trans, 2, mean), ylim=c(0,1), xlim=c(0,.3), type="l", xlab="Tie Probability", ylab="Transitivity", main="Order=100; Simulations=1000")lines(x=graph.tieprob, y=apply(graph.trans, 2, quantile, probs=.025, na.rm=TRUE), col="red", lty=2, type="l")lines(x=graph.tieprob, y=apply(graph.trans, 2, quantile, probs=.975, na.rm=TRUE), col="red", lty=2, type="l")plot(x=graph.tieprob, y=apply(graph.apl, 2, mean), ylim=c(0,8), xlim=c(0,.3), type="l", xlab="Tie Probability", ylab="Average Path Length", main="Order=100; Simulations=1000")lines(x=graph.tieprob, y=apply(graph.apl, 2, quantile, probs=.025, na.rm=TRUE), col="red", lty=2, type="l")lines(x=graph.tieprob, y=apply(graph.apl, 2, quantile, probs=.975, na.rm=TRUE), col="red", lty=2, type="l")

Random Graph Comparisons

Conditional Uniform Graph (CUG) test

Measure observed statistic on a graph
Generate many random graphs (~1000)

Take the same measurement from each

Compare the observed statistic to the random ones

Two families

Reassignment methods
Graph generators

Reassignment Methods

"Shuffling" ties and attributes

Permutation tests
Rewiring tests

Permutation tests

Permute an attribute to test its relationship to a nodal property
Permute a graph to test its relationship to another graph

#Do conservative blogs have more links than liberal ones?
polblogs<-nexus.get(id="polblogs") #0=liberal, 1=conservative

polblogs.totdeg<-degree(polblogs, mode="all")
obs.diff<-mean(polblogs.totdeg[V(polblogs)$LeftRight==1]) -mean(polblogs.totdeg[V(polblogs)$LeftRight==0]) 

perm.fun<-function(){
  perm.attribs<-sample(V(polblogs)$LeftRight)
  return(mean(polblogs.totdeg[perm.attribs==1]) -mean(polblogs.totdeg[perm.attribs==0]))
  }; perm.fun<-cmpfun(perm.fun)
exp.diffs<-replicate(1000, perm.fun())
mean(obs.diff>exp.diffs) #Approximately 0.86, not significantly greater than by chance assignment

"Why not just use a t-test?"

Permutation tests

E-I Index (Krackhardt and Robert Stern 1988)

(Outgroup Ties - Ingroup Ties) / Total Ties

e.i.index<-function(el, attrib){
  in.ties<-sum(attrib[el[,1]]==attrib[el[,2]])
  ex.ties<-sum(attrib[el[,1]]!=attrib[el[,2]])
  return((ex.ties-in.ties)/(ex.ties+in.ties))
  }; e.i.index<-cmpfun(e.i.index)
polblogs.el<-get.edgelist(polblogs, names=FALSE)
polblogs.attrib<-V(polblogs)$LeftRight
obs.val<-e.i.index(polblogs.el, polblogs.attrib)
exp.val<-replicate(1000, e.i.index(polblogs.el, sample(polblogs.attrib)))
mean(obs.val<exp.val) #We expect fewer outgroup ties

Rewiring Tests

One of two approaches

Shuffle edges
Shuffle edge weights

Preserves degree (and edge weight) distribution

par(mfrow=c(1,2))plot(snaspb2013, vertex.size=5, vertex.label=V(snaspb2013)$id, vertex.label.family="sans", vertex.label.cex=.5, edge.arrow.size=.25, margin=c(0,0,0,0), edge.curved=.33)
plot(rewire(snaspb2013, mode="simple", niter=10*ecount(snaspb2013)), vertex.size=5, vertex.label=V(snaspb2013)$id, vertex.label.family="sans", vertex.label.cex=.5, edge.arrow.size=.25, margin=c(0,0,0,0), edge.curved=.33)

Original (left) and Rewired (right)

Rewired Transitivity

obs.trans<-transitivity(snaspb2013)exp.trans<-replicate(1000, transitivity(rewire(snaspb2013, mode="simple", niter=10*ecount(snaspb2013))))mean(obs.trans>exp.trans) #~0.993

What's the interpretation of this finding?

Graph Generators

Erdős–Rényi model
Power Law model
Watts and Strogatz model
Exponential random graph models
...and many others

Erdős–Rényi model

Assumptions

A fixed number (or proportion) of edges are randomly assigned to a fixed set of nodes.

Variants

Exact #ties or density
Directed, undirected
Dyad census-conditioned

er.graph<-erdos.renyi.game(100, .03)plot(er.graph, vertex.size=5, vertex.label="")obs.incent<-centralization.degree(wikidat, mode="in", loops=FALSE)$centralizationexp.incent<-replicate(1000, centralization.degree( erdos.renyi.game(vcount(wikidat), ecount(wikidat), type="gnm", directed=TRUE), mode="in", loops=FALSE)$centralization)        mean(obs.incent>exp.incent)

Power Law model

Assumptions

Networks grow according to the principle of preferential attachment.

P(X=x) ~ x^(-alpha)

Nodes are of degree greater than or equal to x

P(X=x) is the probability of observing a node with degree x or greater

alpha is the scalar

(Barabási and Albert 1999)

Power Law model

wikidat.in<-degree(wikidat, mode="in")
wikidat.out<-degree(wikidat, mode="out")
wikidat.plfit.in<-power.law.fit(wikidat.in)
wikidat.plfit.out<-power.law.fit(wikidat.out)
wikidat.plfit.in$alpha
wikidat.plfit.out$alpha
exp.incent.pl<-replicate(100, centralization.degree( static.power.law.game(vcount(wikidat), ecount(wikidat), exponent.out = wikidat.plfit.out$alpha, exponent.in = wikidat.plfit.in$alpha), mode="in", loops = FALSE)$centralization)
mean(obs.incent>exp.incent.pl)
par(mfrow=c(1,2))
hist(exp.incent, main="Erdos-Renyi Indegree Centralization"); hist(exp.incent.pl, main="Power Law Indegree Centralization")

Watts and Strogatz model

Assumptions

Local neighborhoods bridged by a few actors with ties that jump across neighborhoods.

Game

Begin with a lattice

Each actor initially has the same degree, neighborhood index (nei) * 2

Rewire with a probability of p

par(mfrow=c(1,2))g1<-graph.lattice(100, nei=1)#nei=mean degree/2g1.layout<-layout.fruchterman.reingold(g1)V(g1)$x<-g1.layout[,1]; V(g1)$y<-g1.layout[,2]g1.rewire<-rewire.edges(g1, p=.1) #prob of .1 rewireplot(g1, vertex.size=5, vertex.label="", main="Lattice")plot(g1.rewire, vertex.size=5, vertex.label="", main="Rewire")

Watts and Strogatz model

Question

What's the relationship between the neighborhood index (nei) and graph density?

Watts and Strogatz model

Low values for p resemble lattices

#The following two plots should be approximately equaldev.off(); par(mfrow=c(1,2))
sw.net<-watts.strogatz.game(dim=1, size=100, nei=1, p=.1)plot(g1.rewire, vertex.size=5, vertex.label="", main="Rewired Lattice")
plot(sw.net, vertex.size=5, vertex.label="", layout=g1.layout, main="Watts-Strogatz")

High values for p become closer to Erdős–Rényi graphs

par(mfrow=c(1,2))g2.ws<-watts.strogatz.game(dim=1, size=100, nei=1, p=.9)g2.er<-erdos.renyi.game(100, (2*100/2)/(100*99/2), directed=FALSE)plot(g2.ws, vertex.size=5, vertex.label="", main="Watts-Strogatz")plot(g2.er, vertex.size=5, vertex.label="", main="Erdős–Rényi")

Exponential random graph models

Discussed in tomorrow's lab

Exercises

Recreate the first two plots in this lab using the Watts and Strogatz model. Fix the order (size) to 100 and set the density to approximately 0.02. Vary p from 0 to 1 by .01.

(Hint: mean degree equals the number of edges divided by the number of nodes.)

Recreate the power law exercise on degree centralization for the political blog network. If appropriate to model as a power law, is this network more centralized than predicted?

Refer to the documentation on centralization.betweenness(). Is the Twitter hashtag network we've been using more or less betweenness centralized than its equivalent a rewired network?

(Be sure to delete the edge weights as they aren't meaningful here.)

SNA 2013-R Lab 3

By Benjamin Lind

SNA 2013-R Lab 3

6,790

Benjamin Lind

benjaminlind

Social Network Analysis in R

Laboratory Three Benjamin Lind Social Network Analysis: Internet Research St. Petersburg 21 August, 2013 (11:45-13:15)

Before we begin...

Graph-Level Indices

Graph-Level Indices

Graph-Level Indices

Random Graph Comparisons

Reassignment Methods

Permutation tests

Permutation tests

Rewiring Tests

Original (left) and Rewired (right)

Rewired Transitivity

Graph Generators

Erdős–Rényi model

Power Law model

Power Law model

Watts and Strogatz model

Watts and Strogatz model

Watts and Strogatz model

Exponential random graph models

Exercises

SNA 2013-R Lab 3

More from Benjamin Lind

Laboratory Three

Benjamin Lind

Social Network Analysis: Internet Research

St. Petersburg

21 August, 2013 (11:45-13:15)