# Laboratory ThreeBenjamin Lind Social Network Analysis: Internet Research St. Petersburg 21 August, 2013 (11:45-13:15) ## Before we begin...

Please enter the following code into R to load the previous day's material
``library(RCurl)``dl.data<-getURL("http://pastebin.com/raw.php?i=ChE800ye", ssl.verifypeer=FALSE)``dl.data<-textConnection(dl.data)``source(dl.data)``rm(dl.data)``
Requires a good internet connect
(may take a few minutes)

## Graph-Level Indices

What's the edgewise reciprocity for Wikipedia administrator voting?

``reciprocity(wikidat)``

5.6% of all edges are reciprocated

Is this figure big or small?

Bigger or smaller than what?

We need to establish a basis for comparison.

## Graph-Level Indices

Affected by network size and density
Example
``````graph.order<-seq(from=10, to=100, by=5)
graph.trans<-sapply(graph.order, function(x) replicate(1000, transitivity(erdos.renyi.game(x,.1))))
graph.apl<-sapply(graph.order, function(x) replicate(1000, average.path.length(erdos.renyi.game(x,.1))))``````par(mfrow=c(1,2))
``````plot(x=graph.order, y=apply(graph.trans, 2, mean), ylim=c(0,1), xlim=c(0,100), type="l", xlab="Order", ylab="Transitivity", main="Tie Prob=0.10; Simulations=1000")
lines(x=graph.order, y=apply(graph.trans, 2, quantile, probs=.025, na.rm=TRUE), col="red", lty=2, type="l")
lines(x=graph.order, y=apply(graph.trans, 2, quantile, probs=.975, na.rm=TRUE), col="red", lty=2, type="l")````plot(x=graph.order, y=apply(graph.apl, 2, mean), ylim=c(0,5), xlim=c(0,100), type="l", xlab="Order", ylab="Average Path Length", main="Tie Prob=0.10; Simulations=1000")````lines(x=graph.order, y=apply(graph.apl, 2, quantile, probs=.025, na.rm=TRUE), col="red", lty=2, type="l")
lines(x=graph.order, y=apply(graph.apl, 2, quantile, probs=.975, na.rm=TRUE), col="red", lty=2, type="l")``````

## Graph-Level Indices

Affected by network size and density
``graph.tieprob<-seq(from=.01, to=.30, by=.01)``graph.trans<-sapply(graph.tieprob, function(x) replicate(1000, transitivity(erdos.renyi.game(100,x))))``graph.apl<-sapply(graph.tieprob, function(x) replicate(1000, average.path.length(erdos.renyi.game(100, x))))``par(mfrow=c(1,2))``plot(x=graph.tieprob, y=apply(graph.trans, 2, mean), ylim=c(0,1), xlim=c(0,.3), type="l", xlab="Tie Probability", ylab="Transitivity", main="Order=100; Simulations=1000")``lines(x=graph.tieprob, y=apply(graph.trans, 2, quantile, probs=.025, na.rm=TRUE), col="red", lty=2, type="l")``lines(x=graph.tieprob, y=apply(graph.trans, 2, quantile, probs=.975, na.rm=TRUE), col="red", lty=2, type="l")``plot(x=graph.tieprob, y=apply(graph.apl, 2, mean), ylim=c(0,8), xlim=c(0,.3), type="l", xlab="Tie Probability", ylab="Average Path Length", main="Order=100; Simulations=1000")``lines(x=graph.tieprob, y=apply(graph.apl, 2, quantile, probs=.025, na.rm=TRUE), col="red", lty=2, type="l")``lines(x=graph.tieprob, y=apply(graph.apl, 2, quantile, probs=.975, na.rm=TRUE), col="red", lty=2, type="l")``

## Random Graph Comparisons

Conditional Uniform Graph (CUG) test
• Measure observed statistic on a graph
• Generate many random graphs (~1000)
• Take the same measurement from each
• Compare the observed statistic to the random ones

Two families
• Reassignment methods
• Graph generators

## Reassignment Methods

"Shuffling" ties and attributes

• Permutation tests
• Rewiring tests

## Permutation tests

• Permute an attribute to test its relationship to a nodal property
• Permute a graph to test its relationship to another graph
``````#Do conservative blogs have more links than liberal ones?
polblogs<-nexus.get(id="polblogs") #0=liberal, 1=conservative
``````polblogs.totdeg<-degree(polblogs, mode="all")
obs.diff<-mean(polblogs.totdeg[V(polblogs)\$LeftRight==1]) -mean(polblogs.totdeg[V(polblogs)\$LeftRight==0])
``````perm.fun<-function(){
perm.attribs<-sample(V(polblogs)\$LeftRight)
return(mean(polblogs.totdeg[perm.attribs==1]) -mean(polblogs.totdeg[perm.attribs==0]))
}; perm.fun<-cmpfun(perm.fun)
exp.diffs<-replicate(1000, perm.fun())
mean(obs.diff>exp.diffs) #Approximately 0.86, not significantly greater than by chance assignment``````
"Why not just use a t-test?"

## Permutation tests

E-I Index (Krackhardt and Robert Stern 1988)
(Outgroup Ties - Ingroup Ties) / Total Ties

``````e.i.index<-function(el, attrib){
in.ties<-sum(attrib[el[,1]]==attrib[el[,2]])
ex.ties<-sum(attrib[el[,1]]!=attrib[el[,2]])
return((ex.ties-in.ties)/(ex.ties+in.ties))
}; e.i.index<-cmpfun(e.i.index)
polblogs.el<-get.edgelist(polblogs, names=FALSE)
polblogs.attrib<-V(polblogs)\$LeftRight
obs.val<-e.i.index(polblogs.el, polblogs.attrib)
exp.val<-replicate(1000, e.i.index(polblogs.el, sample(polblogs.attrib)))
mean(obs.val<exp.val) #We expect fewer outgroup ties``````

## Rewiring Tests

One of two approaches
• Shuffle edges
• Shuffle edge weights

Preserves degree (and edge weight) distribution
``par(mfrow=c(1,2))``plot(snaspb2013, vertex.size=5, vertex.label=V(snaspb2013)\$id, vertex.label.family="sans", vertex.label.cex=.5, edge.arrow.size=.25, margin=c(0,0,0,0), edge.curved=.33)``plot(rewire(snaspb2013, mode="simple", niter=10*ecount(snaspb2013)), vertex.size=5, vertex.label=V(snaspb2013)\$id, vertex.label.family="sans", vertex.label.cex=.5, edge.arrow.size=.25, margin=c(0,0,0,0), edge.curved=.33)``

## Rewired Transitivity

``obs.trans<-transitivity(snaspb2013)``exp.trans<-replicate(1000, transitivity(rewire(snaspb2013, mode="simple", niter=10*ecount(snaspb2013))))``mean(obs.trans>exp.trans) #~0.993``
What's the interpretation of this finding?

## Graph Generators

1. Erdős–Rényi model
2. Power Law model
3. Watts and Strogatz model
4. Exponential random graph models
5. ...and many others

## Erdős–Rényi model

Assumptions
A fixed number (or proportion) of edges are randomly assigned to a fixed set of nodes.
Variants
• Exact #ties or density
• Directed, undirected
``er.graph<-erdos.renyi.game(100, .03)``plot(er.graph, vertex.size=5, vertex.label="")``obs.incent<-centralization.degree(wikidat, mode="in", loops=FALSE)\$centralization``exp.incent<-replicate(1000, centralization.degree( erdos.renyi.game(vcount(wikidat), ecount(wikidat), type="gnm", directed=TRUE), mode="in", loops=FALSE)\$centralization)        mean(obs.incent>exp.incent)    ``

## Power Law model

Assumptions
Networks grow according to the principle of preferential attachment.

P(X=x) ~ x^(-alpha)
Nodes are of degree greater than or equal to x
P(X=x) is the probability of observing a node with degree x or greater
alpha is the scalar
(Barabási and Albert 1999)

## Power Law model

``````wikidat.in<-degree(wikidat, mode="in")
wikidat.out<-degree(wikidat, mode="out")
wikidat.plfit.in<-power.law.fit(wikidat.in)
wikidat.plfit.out<-power.law.fit(wikidat.out)
wikidat.plfit.in\$alpha
wikidat.plfit.out\$alpha
exp.incent.pl<-replicate(100, centralization.degree( static.power.law.game(vcount(wikidat), ecount(wikidat), exponent.out = wikidat.plfit.out\$alpha, exponent.in = wikidat.plfit.in\$alpha), mode="in", loops = FALSE)\$centralization)
mean(obs.incent>exp.incent.pl)
par(mfrow=c(1,2))
hist(exp.incent, main="Erdos-Renyi Indegree Centralization"); hist(exp.incent.pl, main="Power Law Indegree Centralization") ``````

## Watts and Strogatz model

Assumptions
Local neighborhoods bridged by a few actors with ties that jump across neighborhoods.
Game
• Begin with a lattice
• Each actor initially has the same degree, neighborhood index (nei) * 2
• Rewire with a probability of p
``par(mfrow=c(1,2))``g1<-graph.lattice(100, nei=1)#nei=mean degree/2``g1.layout<-layout.fruchterman.reingold(g1)``V(g1)\$x<-g1.layout[,1]; V(g1)\$y<-g1.layout[,2]``g1.rewire<-rewire.edges(g1, p=.1) #prob of .1 rewire``plot(g1, vertex.size=5, vertex.label="", main="Lattice")``plot(g1.rewire, vertex.size=5, vertex.label="", main="Rewire")``

## Watts and Strogatz model

Question
What's the relationship between the neighborhood index (nei) and graph density?

## Watts and Strogatz model

Low values for p resemble lattices
````#The following two plots should be approximately equal````dev.off(); par(mfrow=c(1,2))
sw.net<-watts.strogatz.game(dim=1, size=100, nei=1, p=.1)``````plot(g1.rewire, vertex.size=5, vertex.label="", main="Rewired Lattice")
plot(sw.net, vertex.size=5, vertex.label="", layout=g1.layout, main="Watts-Strogatz")``````
High values for p become closer to Erdős–Rényi graphs
``par(mfrow=c(1,2))``g2.ws<-watts.strogatz.game(dim=1, size=100, nei=1, p=.9)``g2.er<-erdos.renyi.game(100, (2*100/2)/(100*99/2), directed=FALSE)``plot(g2.ws, vertex.size=5, vertex.label="", main="Watts-Strogatz")``plot(g2.er, vertex.size=5, vertex.label="", main="Erdős–Rényi")``

## Exponential random graph models

Discussed in tomorrow's lab

## Exercises

Recreate the first two plots in this lab using the Watts and Strogatz model.  Fix the order (size) to 100 and set the density to approximately 0.02.  Vary p from 0 to 1 by .01.
(Hint: mean degree equals the number of edges divided by the number of nodes.)

Recreate the power law exercise on degree centralization for the political blog network.  If appropriate to model as a power law, is this network more centralized than predicted?

Refer to the documentation on centralization.betweenness().  Is the Twitter hashtag network we've been using more or less betweenness centralized than its equivalent a rewired network?
(Be sure to delete the edge weights as they aren't meaningful here.)

By Benjamin Lind

• 4,814