Social Network Analysis in R

Laboratory Two
Benjamin Lind

Social Network Analysis: Internet Research
St. Petersburg
20 August, 2013 (11:45-13:15)

Network Analysis Packages in R







Suite collecting the packages
  • sna
  • network
  • ergm, tergm
  • ...and others
Interdisciplinary, cross-university development team

  • General social network metrics
  • Data storage
  • Modeling and statistical inference


Originated as a fork of the sna package
Libraries C/C++, Python, and R
Created by Gábor Csárdi

Motivations and Strengths
  • Network science tradition
  • General and complex network metrics
  • Data storage


statnet and igraph don't play nicely
  • Many identical function names
  • Different data formats

intergraph is a conversion tool
  • asIgraph()
  • asNetwork()


Created by Tore Opsahl
Provides metrics for three types of networks
  • Weighted
  • Two-mode
  • Longitudinal

Motivated by the bias introduced by coercing these networks into simple graphs

Works well with igraph


Originally Windows-based, stand alone
  • Longitudinal network modeling
  • Longitudinal behavioral network modeling
  • Cross-sectional network modeling
    • p* / ERGM


R can read anything!*

  • Delimited text (comma, tab, bang, pipe, etc)
  • Excel and Gnumeric files
  • Internet files
    • Tables from webpages
    • .json, .xml, .html files
    • Files stored online
  • Pajek files
  • SPSS and Stata files
  • Spatial data in shapefiles and KML

If possible, I'd recommend tab-delimited text files.

*It reads some file formats much better (and easier) than others

Import a Tab-Delimited Edge List

Wikipedia votes for site administrators
One-mode, directed, simple

(Picture courtesy of Frank Schulenburg)

Import a Tab-Delimited Edge List


#Function to read a data table from online read.sstab<-function(theurl, ...){ #_theurl_ refers to the location of the data #_..._ are parameters passed onto read.table require(RCurl) outtab<-getURL(theurl, ssl.verifypeer=FALSE) outtab<-textConnection(outtab) outtab<-read.table(outtab, sep="\t", ...) return(outtab) }
wikidat<-read.sstab("", header=TRUE, skip=6) #Convert the data to igraph wikidat<, directed=TRUE) summary(wikidat) #Nodes: 7115; Edges: 103689

Import a Pajek File

#snaspb2013 name network
One-mode, directed, simple

Import a Pajek File

download.file("", "", method="wget")
snaspb2013<-read.graph("", "pajek")


For details and formats produced by other software, see


Import Two-Mode Data into tnet

Data on membership in metal bands
Collected from

These commands will take some time.
Advisable to revisit later.

metal.bands.df<-read.sstab("", header=TRUE, skip=4,, stringsAsFactors=FALSE, strip.white=TRUE)
colnames(metal.bands.df)[c(1,2)]<-c("group", "member")
su<-function(x) return(sort(unique(x)))

(Picture by Cecil)

Import Two-Mode Data into tnet

non.dupes<-which(duplicated(paste(metal.bands.df$group, metal.bands.df$member, sep="*"))==FALSE)
metal.bands.df<-metal.bands.df[non.dupes,c("member", "group")]  all.metal.names<-unique(c(su(metal.bands.df$member), su(metal.bands.df$group)))all.metal.names<-all.metal.names[-which(all.metal.names=="")]metal.bands.df$member<-match(metal.bands.df$member, all.metal.names) metal.bands.df$group<-match(metal.bands.df$group, all.metal.names)
miss.rows<-which(($member) |$group))==TRUE) metal.bands.df<-metal.bands.df[-miss.rows,]<-as.tnet(metal.bands.df, type="binary two-mode tnet"); rm(non.dupes, miss.rows)


Density and Degree

is.simple(snaspb2013) #Verify it's simple
is.directed(snaspb2013) #Verify it's directed
vcount(snaspb2013) #Number of vertices
ecount(snaspb2013) #Number of edges
graph.density(snaspb2013) #Density

V(snaspb2013)$indegree<-degree(snaspb2013, mode="in") V(snaspb2013)$outdegree<-degree(snaspb2013, mode="out") V(snaspb2013)$totaldegree<-degree(snaspb2013, mode="total")

all.vatts<-list.vertex.attributes(snaspb2013) sapply(all.vatts, get.vertex.attribute, graph=snaspb2013) summary(sapply(all.vatts[-1], get.vertex.attribute, graph=snaspb2013))
par(mfrow=c(1,2)) hist(V(snaspb2013)$indegree, main="snaspb2013", xlab="Indegree") hist(V(snaspb2013)$outdegree, main="snaspb2013", xlab="Outdegree")

Dyads and Triads

Which measurement of reciprocity did that value refer to?
  • Edgewise?
  • Dyadic?
  • Dyadic, non-null ("ratio")?

transitivity(snaspb2013) #What does this number refer to?
V(snaspb2013)$loc.trans<-transitivity(snaspb2013, "local")
Who has the highest clustering coefficient?
Metal bonus!
clustering_tm(, subsample=.1)


diameter(snaspb2013) / 100 #Bug in the code
hist(shortest.paths(snaspb2013)/100, main="Histogram of Shortest Path Lengths", xlab="Path Lengths")

V(snaspb2013)$betw<-betweenness(snaspb2013)E(snaspb2013)$eb<-edge.betweenness(snaspb2013)hist(E(snaspb2013)$eb, main="Histogram of Edge Betweenness", sub="snaspb2013", xlab="Edge Betweenness")

#\m/ METAL BONUS! \m/member.geodist<-distance_tm(


#How many weak and strong components do we have? sapply(c("weak", "strong"), function(x) return(sapply(list(snaspb2013=snaspb2013, wikidat=wikidat), no.clusters, mode=x)))
#Notice the distributionsclusters(snaspb2013, mode="weak")$csize clusters(snaspb2013, mode="strong")$csize clusters(wikidat, mode="weak")$csize tail(sort(clusters(wikidat, mode="strong")$csize))

V(snaspb2013)$comp.w<-clusters(snaspb2013, mode="weak")$membership V(snaspb2013)$comp.s<-clusters(snaspb2013, mode="strong")$membership
V(snaspb2013)$id[which(V(snaspb2013)$comp.s == which.max(clusters(snaspb2013, mode="strong")$csize))]


We've already reviewed degree and betweenness.
Examples of closeness and eigenvector centrality:


V(snaspb2013)$kc.undir<-graph.coreness(as.undirected(snaspb2013, mode="collapse"))

#How are undirected k-cores related to centrality?<-function(x, y=V(snaspb2013)$kc.undir){
  a<-get.vertex.attribute(snaspb2013, name=x)
  return(cor.test(a, b, method="kendall", exact=FALSE)$estimate)

cent.atts<-c("indegree", "outdegree", "totaldegree", "betw", "closeness", "evcent") sapply(cent.atts,
#Directed k-cores sapply(c("in", "out", "all"), function(y) return(sapply(cent.atts,, y=graph.coreness(snaspb2013, mode=y)))) rm(cent.atts,

Community Detection<-which(V(snaspb2013)$comp.w == which.max(clusters(snaspb2013, mode="weak")$csize))<-induced.subgraph(snaspb2013,
snaspb2013.comms<, mode="collapse")) snaspb2013.comms$modularity V($comms<-snaspb2013.comms$membership names(snaspb2013.comms$membership)<-V($id sort(snaspb2013.comms$membership)
snaspb2013.comms.w<, mode="collapse"), weights=max(E($eb)-E($eb) snaspb2013.comms.w$modularity V($comms.w<-snaspb2013.comms.w$membership names(snaspb2013.comms.w$membership)<-V($id sort(snaspb2013.comms.w$membership)
See also ?communities


Start simple
What could be improved?
  • Include labels
  • Less activity in the center
  • Smaller nodes, arrows
snaspb.layout<-layout.fruchterman.reingold(snaspb2013, params=list(niter=5000, area=vcount(snaspb2013)^3))
plot(snaspb2013, vertex.size=5, vertex.label=V(snaspb2013)$id,"sans", vertex.label.cex=.75, edge.arrow.size=.5, margin=c(0,0,0,0), edge.curved=.33)


What are your empirical interests?
  • Sizes
  • Colors and shading
    • Categorical variables
    • Interval and ordinal variables
  • Transparency

Parameters to Vary
  • Nodes
    • Non-continuous: Shape, node color, border color
    • Continuous: Node size, border width
  • Edges
    • Non-continuous: Color, line type, arrowhead type
    • Continuous: Width
  • Labels: Size, color, visibility


How would you represent:
  • k-cores?
  • Eigenvector centrality?
  • Edge betweenness?

Advisability aside, would it be possible to represent all of them at once?

png("snaspb2013.kc.png", height=8, width=11, units="in", bg="transparent", res=300)

plot(snaspb2013, vertex.size=8*(.5+V(snaspb2013)$evcent), vertex.label = V(snaspb2013)$id, edge.width = log(E(snaspb2013)$eb+1)/2, vertex.color = rev(heat.colors(max(V(snaspb2013)$kc.undir)+1))[V(snaspb2013)$kc.undir+1], vertex.label.color="white","sans", vertex.label.cex=.75, edge.arrow.size=.5, edge.curved=.33, margin=c(0, 0, 0, 0))    

Divergent Color Schemes

Let's try it for the communities detected!<-layout.fruchterman.reingold(, params=list(niter=5000, area=vcount(^3)) V($x<[,1] V($y<[,2] rm(

plot(, vertex.size=5, vertex.color=rainbow(max(V($comms.w))[V($comms.w], vertex.label = V($id,"sans", vertex.label.cex=.75, edge.arrow.size=.5, margin=c(0,0,0,0), edge.curved=.33)


Calculate the non-null dyadic reciprocity on the Wikipedia network.

Calculate the modularity for the largest weak component in the Wikipedia network. Assign ("set") it as a graph-level attribute.
(Hint: ??"graph attribute")

What are the maximum in, out, and "all" k-core values in the Wikipedia network?  Assign those three k-core values as vertex attributes.

Plot the #snaspb2013 network that illustrates weighted communities with nodes scaled by betweenness centrality, their labels scaled by local transitivity, edges scaled according to their betweenness centrality, and using a Kamada Kawai layout.

SNA 2013-R Lab 2

By Benjamin Lind

SNA 2013-R Lab 2

  • 7,602