Social Network Analysis in R

Laboratory Three
Benjamin Lind

Social Network Analysis: Internet Research
St. Petersburg
21 August, 2013 (11:45-13:15)

Before we begin...

Please enter the following code into R to load the previous day's material
library(RCurl)dl.data<-getURL("http://pastebin.com/raw.php?i=ChE800ye", ssl.verifypeer=FALSE)
dl.data<-textConnection(dl.data)
source(dl.data)
rm(dl.data)
Requires a good internet connect
(may take a few minutes)

Graph-Level Indices

What's the edgewise reciprocity for Wikipedia administrator voting?

reciprocity(wikidat)

5.6% of all edges are reciprocated

Is this figure big or small?

Bigger or smaller than what?

We need to establish a basis for comparison.

Graph-Level Indices

Affected by network size and density
Example
graph.order<-seq(from=10, to=100, by=5)
graph.trans<-sapply(graph.order, function(x) replicate(1000, transitivity(erdos.renyi.game(x,.1))))
graph.apl<-sapply(graph.order, function(x) replicate(1000, average.path.length(erdos.renyi.game(x,.1))))par(mfrow=c(1,2))
plot(x=graph.order, y=apply(graph.trans, 2, mean), ylim=c(0,1), xlim=c(0,100), type="l", xlab="Order", ylab="Transitivity", main="Tie Prob=0.10; Simulations=1000")
lines(x=graph.order, y=apply(graph.trans, 2, quantile, probs=.025, na.rm=TRUE), col="red", lty=2, type="l")
lines(x=graph.order, y=apply(graph.trans, 2, quantile, probs=.975, na.rm=TRUE), col="red", lty=2, type="l")
plot(x=graph.order, y=apply(graph.apl, 2, mean), ylim=c(0,5), xlim=c(0,100), type="l", xlab="Order", ylab="Average Path Length", main="Tie Prob=0.10; Simulations=1000")
lines(x=graph.order, y=apply(graph.apl, 2, quantile, probs=.025, na.rm=TRUE), col="red", lty=2, type="l") lines(x=graph.order, y=apply(graph.apl, 2, quantile, probs=.975, na.rm=TRUE), col="red", lty=2, type="l")


Graph-Level Indices

Affected by network size and density
graph.tieprob<-seq(from=.01, to=.30, by=.01)graph.trans<-sapply(graph.tieprob, function(x) replicate(1000, transitivity(erdos.renyi.game(100,x))))graph.apl<-sapply(graph.tieprob, function(x) replicate(1000, average.path.length(erdos.renyi.game(100, x))))par(mfrow=c(1,2))plot(x=graph.tieprob, y=apply(graph.trans, 2, mean), ylim=c(0,1), xlim=c(0,.3), type="l", xlab="Tie Probability", ylab="Transitivity", main="Order=100; Simulations=1000")lines(x=graph.tieprob, y=apply(graph.trans, 2, quantile, probs=.025, na.rm=TRUE), col="red", lty=2, type="l")lines(x=graph.tieprob, y=apply(graph.trans, 2, quantile, probs=.975, na.rm=TRUE), col="red", lty=2, type="l")plot(x=graph.tieprob, y=apply(graph.apl, 2, mean), ylim=c(0,8), xlim=c(0,.3), type="l", xlab="Tie Probability", ylab="Average Path Length", main="Order=100; Simulations=1000")lines(x=graph.tieprob, y=apply(graph.apl, 2, quantile, probs=.025, na.rm=TRUE), col="red", lty=2, type="l")lines(x=graph.tieprob, y=apply(graph.apl, 2, quantile, probs=.975, na.rm=TRUE), col="red", lty=2, type="l")


Random Graph Comparisons

Conditional Uniform Graph (CUG) test
  • Measure observed statistic on a graph
  • Generate many random graphs (~1000)
    • Take the same measurement from each
  • Compare the observed statistic to the random ones


Two families
  • Reassignment methods
  • Graph generators

Reassignment Methods

"Shuffling" ties and attributes

  • Permutation tests
  • Rewiring tests

Permutation tests

  • Permute an attribute to test its relationship to a nodal property
  • Permute a graph to test its relationship to another graph
#Do conservative blogs have more links than liberal ones?
polblogs<-nexus.get(id="polblogs") #0=liberal, 1=conservative

polblogs.totdeg<-degree(polblogs, mode="all") obs.diff<-mean(polblogs.totdeg[V(polblogs)$LeftRight==1]) -mean(polblogs.totdeg[V(polblogs)$LeftRight==0])
perm.fun<-function(){ perm.attribs<-sample(V(polblogs)$LeftRight) return(mean(polblogs.totdeg[perm.attribs==1]) -mean(polblogs.totdeg[perm.attribs==0])) }; perm.fun<-cmpfun(perm.fun) exp.diffs<-replicate(1000, perm.fun()) mean(obs.diff>exp.diffs) #Approximately 0.86, not significantly greater than by chance assignment
"Why not just use a t-test?"

Permutation tests

E-I Index (Krackhardt and Robert Stern 1988)
(Outgroup Ties - Ingroup Ties) / Total Ties 

e.i.index<-function(el, attrib){
  in.ties<-sum(attrib[el[,1]]==attrib[el[,2]])
  ex.ties<-sum(attrib[el[,1]]!=attrib[el[,2]])
  return((ex.ties-in.ties)/(ex.ties+in.ties))
  }; e.i.index<-cmpfun(e.i.index)
polblogs.el<-get.edgelist(polblogs, names=FALSE)
polblogs.attrib<-V(polblogs)$LeftRight
obs.val<-e.i.index(polblogs.el, polblogs.attrib)
exp.val<-replicate(1000, e.i.index(polblogs.el, sample(polblogs.attrib)))
mean(obs.val<exp.val) #We expect fewer outgroup ties

Rewiring Tests

One of two approaches
  • Shuffle edges
  • Shuffle edge weights

Preserves degree (and edge weight) distribution
par(mfrow=c(1,2))plot(snaspb2013, vertex.size=5, vertex.label=V(snaspb2013)$id, vertex.label.family="sans", vertex.label.cex=.5, edge.arrow.size=.25, margin=c(0,0,0,0), edge.curved=.33)
plot(rewire(snaspb2013, mode="simple", niter=10*ecount(snaspb2013)), vertex.size=5, vertex.label=V(snaspb2013)$id, vertex.label.family="sans", vertex.label.cex=.5, edge.arrow.size=.25, margin=c(0,0,0,0), edge.curved=.33)

Original (left) and Rewired (right)

Rewired Transitivity

obs.trans<-transitivity(snaspb2013)exp.trans<-replicate(1000, transitivity(rewire(snaspb2013, mode="simple", niter=10*ecount(snaspb2013))))mean(obs.trans>exp.trans) #~0.993
What's the interpretation of this finding?

Graph Generators

  1. Erdős–Rényi model
  2. Power Law model
  3. Watts and Strogatz model
  4. Exponential random graph models
  5. ...and many others

Erdős–Rényi model

Assumptions
A fixed number (or proportion) of edges are randomly assigned to a fixed set of nodes.
Variants
  • Exact #ties or density
  • Directed, undirected
  • Dyad census-conditioned
er.graph<-erdos.renyi.game(100, .03)plot(er.graph, vertex.size=5, vertex.label="")obs.incent<-centralization.degree(wikidat, mode="in", loops=FALSE)$centralizationexp.incent<-replicate(1000, centralization.degree( erdos.renyi.game(vcount(wikidat), ecount(wikidat), type="gnm", directed=TRUE), mode="in", loops=FALSE)$centralization)        mean(obs.incent>exp.incent)    

Power Law model

Assumptions
Networks grow according to the principle of preferential attachment.






P(X=x) ~ x^(-alpha)
Nodes are of degree greater than or equal to x
P(X=x) is the probability of observing a node with degree x or greater
alpha is the scalar
(Barabási and Albert 1999)

Power Law model


wikidat.in<-degree(wikidat, mode="in")
wikidat.out<-degree(wikidat, mode="out")
wikidat.plfit.in<-power.law.fit(wikidat.in)
wikidat.plfit.out<-power.law.fit(wikidat.out)
wikidat.plfit.in$alpha
wikidat.plfit.out$alpha
exp.incent.pl<-replicate(100, centralization.degree( static.power.law.game(vcount(wikidat), ecount(wikidat), exponent.out = wikidat.plfit.out$alpha, exponent.in = wikidat.plfit.in$alpha), mode="in", loops = FALSE)$centralization)
mean(obs.incent>exp.incent.pl)
par(mfrow=c(1,2))
hist(exp.incent, main="Erdos-Renyi Indegree Centralization"); hist(exp.incent.pl, main="Power Law Indegree Centralization") 


Watts and Strogatz model

Assumptions
Local neighborhoods bridged by a few actors with ties that jump across neighborhoods.
Game
  • Begin with a lattice
    • Each actor initially has the same degree, neighborhood index (nei) * 2
  • Rewire with a probability of p  
par(mfrow=c(1,2))g1<-graph.lattice(100, nei=1)#nei=mean degree/2g1.layout<-layout.fruchterman.reingold(g1)V(g1)$x<-g1.layout[,1]; V(g1)$y<-g1.layout[,2]g1.rewire<-rewire.edges(g1, p=.1) #prob of .1 rewireplot(g1, vertex.size=5, vertex.label="", main="Lattice")plot(g1.rewire, vertex.size=5, vertex.label="", main="Rewire")

Watts and Strogatz model

Question
What's the relationship between the neighborhood index (nei) and graph density?

Watts and Strogatz model

Low values for p resemble lattices
#The following two plots should be approximately equaldev.off(); par(mfrow=c(1,2))
sw.net<-watts.strogatz.game(dim=1, size=100, nei=1, p=.1)plot(g1.rewire, vertex.size=5, vertex.label="", main="Rewired Lattice")
plot(sw.net, vertex.size=5, vertex.label="", layout=g1.layout, main="Watts-Strogatz")
High values for p become closer to Erdős–Rényi graphs
par(mfrow=c(1,2))g2.ws<-watts.strogatz.game(dim=1, size=100, nei=1, p=.9)g2.er<-erdos.renyi.game(100, (2*100/2)/(100*99/2), directed=FALSE)plot(g2.ws, vertex.size=5, vertex.label="", main="Watts-Strogatz")plot(g2.er, vertex.size=5, vertex.label="", main="Erdős–Rényi")


Exponential random graph models

Discussed in tomorrow's lab

Exercises

Recreate the first two plots in this lab using the Watts and Strogatz model.  Fix the order (size) to 100 and set the density to approximately 0.02.  Vary p from 0 to 1 by .01.
(Hint: mean degree equals the number of edges divided by the number of nodes.)

Recreate the power law exercise on degree centralization for the political blog network.  If appropriate to model as a power law, is this network more centralized than predicted?

Refer to the documentation on centralization.betweenness().  Is the Twitter hashtag network we've been using more or less betweenness centralized than its equivalent a rewired network?
(Be sure to delete the edge weights as they aren't meaningful here.)
Made with Slides.com