Social Network Analysis in R
Laboratory Three
Benjamin Lind
Social Network Analysis: Internet Research
St. Petersburg
21 August, 2013 (11:45-13:15)
Social Network Analysis: Internet Research
St. Petersburg
21 August, 2013 (11:45-13:15)
Before we begin...
Please enter the following code into R to load the previous day's material
library(RCurl)
dl.data<-getURL("http://pastebin.com/raw.php?i=ChE800ye", ssl.verifypeer=FALSE)
dl.data<-textConnection(dl.data)
source(dl.data)
rm(dl.data)
Requires a good internet connect
(may take a few minutes)Graph-Level Indices
What's the edgewise reciprocity for Wikipedia administrator voting?
reciprocity(wikidat)
5.6% of all edges are reciprocated
Is this figure big or small?
Bigger or smaller than what?
We need to establish a basis for comparison.
Graph-Level Indices
Affected by network size and density
Example
graph.order<-seq(from=10, to=100, by=5) graph.trans<-sapply(graph.order, function(x) replicate(1000, transitivity(erdos.renyi.game(x,.1)))) graph.apl<-sapply(graph.order, function(x) replicate(1000, average.path.length(erdos.renyi.game(x,.1))))
par(mfrow=c(1,2))
plot(x=graph.order, y=apply(graph.trans, 2, mean), ylim=c(0,1), xlim=c(0,100), type="l", xlab="Order", ylab="Transitivity", main="Tie Prob=0.10; Simulations=1000") lines(x=graph.order, y=apply(graph.trans, 2, quantile, probs=.025, na.rm=TRUE), col="red", lty=2, type="l") lines(x=graph.order, y=apply(graph.trans, 2, quantile, probs=.975, na.rm=TRUE), col="red", lty=2, type="l")
plot(x=graph.order, y=apply(graph.apl, 2, mean), ylim=c(0,5), xlim=c(0,100), type="l", xlab="Order", ylab="Average Path Length", main="Tie Prob=0.10; Simulations=1000")
lines(x=graph.order, y=apply(graph.apl, 2, quantile, probs=.025, na.rm=TRUE), col="red", lty=2, type="l") lines(x=graph.order, y=apply(graph.apl, 2, quantile, probs=.975, na.rm=TRUE), col="red", lty=2, type="l")
Graph-Level Indices
Affected by network size and density
graph.tieprob<-seq(from=.01, to=.30, by=.01)
graph.trans<-sapply(graph.tieprob, function(x) replicate(1000, transitivity(erdos.renyi.game(100,x))))
graph.apl<-sapply(graph.tieprob, function(x) replicate(1000, average.path.length(erdos.renyi.game(100, x))))
par(mfrow=c(1,2))
plot(x=graph.tieprob, y=apply(graph.trans, 2, mean), ylim=c(0,1), xlim=c(0,.3), type="l", xlab="Tie Probability", ylab="Transitivity", main="Order=100; Simulations=1000")
lines(x=graph.tieprob, y=apply(graph.trans, 2, quantile, probs=.025, na.rm=TRUE), col="red", lty=2, type="l")
lines(x=graph.tieprob, y=apply(graph.trans, 2, quantile, probs=.975, na.rm=TRUE), col="red", lty=2, type="l")
plot(x=graph.tieprob, y=apply(graph.apl, 2, mean), ylim=c(0,8), xlim=c(0,.3), type="l", xlab="Tie Probability", ylab="Average Path Length", main="Order=100; Simulations=1000")
lines(x=graph.tieprob, y=apply(graph.apl, 2, quantile, probs=.025, na.rm=TRUE), col="red", lty=2, type="l")
lines(x=graph.tieprob, y=apply(graph.apl, 2, quantile, probs=.975, na.rm=TRUE), col="red", lty=2, type="l")
Random Graph Comparisons
Conditional Uniform Graph (CUG) test
- Measure observed statistic on a graph
- Generate many random graphs (~1000)
- Take the same measurement from each
- Compare the observed statistic to the random ones
Two families
- Reassignment methods
- Graph generators
Reassignment Methods
"Shuffling" ties and attributes
- Permutation tests
- Rewiring tests
Permutation tests
- Permute an attribute to test its relationship to a nodal property
- Permute a graph to test its relationship to another graph
#Do conservative blogs have more links than liberal ones? polblogs<-nexus.get(id="polblogs") #0=liberal, 1=conservative
polblogs.totdeg<-degree(polblogs, mode="all") obs.diff<-mean(polblogs.totdeg[V(polblogs)$LeftRight==1]) -mean(polblogs.totdeg[V(polblogs)$LeftRight==0])
perm.fun<-function(){ perm.attribs<-sample(V(polblogs)$LeftRight) return(mean(polblogs.totdeg[perm.attribs==1]) -mean(polblogs.totdeg[perm.attribs==0])) }; perm.fun<-cmpfun(perm.fun) exp.diffs<-replicate(1000, perm.fun()) mean(obs.diff>exp.diffs) #Approximately 0.86, not significantly greater than by chance assignment
"Why not just use a t-test?"
Permutation tests
E-I Index (Krackhardt and Robert Stern 1988)
(Outgroup Ties - Ingroup Ties) / Total Ties
e.i.index<-function(el, attrib){
in.ties<-sum(attrib[el[,1]]==attrib[el[,2]])
ex.ties<-sum(attrib[el[,1]]!=attrib[el[,2]])
return((ex.ties-in.ties)/(ex.ties+in.ties))
}; e.i.index<-cmpfun(e.i.index)
polblogs.el<-get.edgelist(polblogs, names=FALSE)
polblogs.attrib<-V(polblogs)$LeftRight
obs.val<-e.i.index(polblogs.el, polblogs.attrib)
exp.val<-replicate(1000, e.i.index(polblogs.el, sample(polblogs.attrib)))
mean(obs.val<exp.val) #We expect fewer outgroup ties
Rewiring Tests
One of two approaches
- Shuffle edges
- Shuffle edge weights
Preserves degree (and edge weight) distribution
par(mfrow=c(1,2))
plot(snaspb2013, vertex.size=5, vertex.label=V(snaspb2013)$id, vertex.label.family="sans", vertex.label.cex=.5, edge.arrow.size=.25, margin=c(0,0,0,0), edge.curved=.33)
plot(rewire(snaspb2013, mode="simple", niter=10*ecount(snaspb2013)), vertex.size=5, vertex.label=V(snaspb2013)$id, vertex.label.family="sans", vertex.label.cex=.5, edge.arrow.size=.25, margin=c(0,0,0,0), edge.curved=.33)
Original (left) and Rewired (right)
Rewired Transitivity
obs.trans<-transitivity(snaspb2013)
exp.trans<-replicate(1000, transitivity(rewire(snaspb2013, mode="simple", niter=10*ecount(snaspb2013))))
mean(obs.trans>exp.trans) #~0.993
What's the interpretation of this finding?
Graph Generators
- Erdős–Rényi model
- Power Law model
- Watts and Strogatz model
- Exponential random graph models
- ...and many others
Erdős–Rényi model
Assumptions
A fixed number (or proportion) of edges are randomly assigned to a fixed set of nodes.
Variants
- Exact #ties or density
- Directed, undirected
- Dyad census-conditioned
er.graph<-erdos.renyi.game(100, .03)
plot(er.graph, vertex.size=5, vertex.label="")
obs.incent<-centralization.degree(wikidat, mode="in", loops=FALSE)$centralization
exp.incent<-replicate(1000, centralization.degree( erdos.renyi.game(vcount(wikidat), ecount(wikidat), type="gnm", directed=TRUE), mode="in", loops=FALSE)$centralization) mean(obs.incent>exp.incent)
Power Law model
Assumptions
Networks grow according to the principle of preferential attachment.
P(X=x) ~ x^(-alpha)
Nodes are of degree greater than or equal to x
P(X=x) is the probability of observing a node with degree x or greater
alpha is the scalar
(Barabási and Albert 1999)
Power Law model
wikidat.in<-degree(wikidat, mode="in")
wikidat.out<-degree(wikidat, mode="out")
wikidat.plfit.in<-power.law.fit(wikidat.in)
wikidat.plfit.out<-power.law.fit(wikidat.out)
wikidat.plfit.in$alpha
wikidat.plfit.out$alpha
exp.incent.pl<-replicate(100, centralization.degree( static.power.law.game(vcount(wikidat), ecount(wikidat), exponent.out = wikidat.plfit.out$alpha, exponent.in = wikidat.plfit.in$alpha), mode="in", loops = FALSE)$centralization)
mean(obs.incent>exp.incent.pl)
par(mfrow=c(1,2))
hist(exp.incent, main="Erdos-Renyi Indegree Centralization"); hist(exp.incent.pl, main="Power Law Indegree Centralization")
Watts and Strogatz model
Assumptions
Local neighborhoods bridged by a few actors with ties that jump across neighborhoods.
Game
-
Begin with a lattice
- Each actor initially has the same degree, neighborhood index (nei) * 2
-
Rewire with a probability of
p
par(mfrow=c(1,2))
g1<-graph.lattice(100, nei=1)#nei=mean degree/2
g1.layout<-layout.fruchterman.reingold(g1)
V(g1)$x<-g1.layout[,1]; V(g1)$y<-g1.layout[,2]
g1.rewire<-rewire.edges(g1, p=.1) #prob of .1 rewire
plot(g1, vertex.size=5, vertex.label="", main="Lattice")
plot(g1.rewire, vertex.size=5, vertex.label="", main="Rewire")
Watts and Strogatz model
Question
What's the relationship between the neighborhood index (nei) and graph density?
Watts and Strogatz model
Low values for p resemble lattices
#The following two plots should be approximately equal
dev.off(); par(mfrow=c(1,2)) sw.net<-watts.strogatz.game(dim=1, size=100, nei=1, p=.1)
plot(g1.rewire, vertex.size=5, vertex.label="", main="Rewired Lattice") plot(sw.net, vertex.size=5, vertex.label="", layout=g1.layout, main="Watts-Strogatz")
High values for
p
become closer to Erdős–Rényi graphs
par(mfrow=c(1,2))
g2.ws<-watts.strogatz.game(dim=1, size=100, nei=1, p=.9)
g2.er<-erdos.renyi.game(100, (2*100/2)/(100*99/2), directed=FALSE)
plot(g2.ws, vertex.size=5, vertex.label="", main="Watts-Strogatz")
plot(g2.er, vertex.size=5, vertex.label="", main="Erdős–Rényi")
Exponential random graph models
Discussed in tomorrow's lab
Exercises
Recreate the first two plots in this lab using the Watts and Strogatz model. Fix the order (size) to 100 and set the density to approximately 0.02. Vary p from 0 to 1 by .01.
(Hint: mean degree equals the number of edges divided by the number of nodes.)
Recreate the power law exercise on degree centralization for the political blog network. If appropriate to model as a power law, is this network more centralized than predicted?
Refer to the documentation on centralization.betweenness(). Is the Twitter hashtag network we've been using more or less betweenness centralized than its equivalent a rewired network?
(Be sure to delete the edge weights as they aren't meaningful here.)
SNA 2013-R Lab 3
By Benjamin Lind
SNA 2013-R Lab 3
- 6,790