p* Models & Network Dynamics
Networks in the Global World 2014
9:30-11:30, 12:00-14:00 June 27, 2014
Room 67, 3rd Floor
http://goo.gl/l5XTkp
Benjamin Lind, PhD
Faculty of Sociology, Center for Advanced Studies
International Laboratory for Applied Network Research
Higher School of Economics, Moscow
What are Exponential Random Graph (p*) Models?
"Why Model Social Networks?"
Robins et al. 2007. Social Networks.
- To understand regularities in tie formation
- Can network structures emerge by chance?
- Test competing explanations for the same outcome
- Efficiently represent complex network data
- Address relationship between local and global properties
Modeling Steps
Robins et al. 2007. Social Networks.
- Each edge is a random variable
- Propose dependence hypothesis
- Only certain forms will meet dependence hypothesis
- If there are none, the model is "degenerate"
- If there are only a few, it is "near degenerate"
- Simplify
- Reduce parameters
- Homogeneity assumption
- Estimate and interpret parameters
Formula
Probability that a random graph is isomorphic to the observed
Pr(Y = y) = (1 / k) * exp{ΣA ηA*gA(Y)}
- Y = A random graph
- y = Observed graph
- A = A network statistic
- g(Y) = Vector of model statistics
-
η = Parameter vector
- k = Normalizing constant
Dependence Assumptions
(Robins et al. 2007. Social Networks.)
- Bernoulli graphs
- Dyadic dependence
- Markov random graphs
Which Network Statistics to Model?
- Edges
- Nodal attributes
- Popularity effects
- Homophily effects
- Reciprocity
- Triadic terms
- The number of triangles
- Shared partners
- Dyadic or edge
- Transitivity
- Path terms
- Cycles
- ...
Software
Be sure to use the most recent version!
We'll be using v. 3.1.0.
Software
statnet
- Suite
- network
- ergm
- sna
- tergm
- ...
- Uses "network" data objects
-
Written in both R and C
- Not compatible with other R network packages
install.packages("statnet")
library(statnet)
Example Data
Grey's Anatomy "Hook-Up" Network
Read the Data
Enter the following URLs into your browser in this order
- Sociomatrix: http://goo.gl/Qcqd3u
- Should be titled "Grey's Anatomy.tsv"
- Attributes: http://goo.gl/vVLhrR
- Should be titled "Grey's Anatomy (1).tsv"
setwd("...") # Enter the folder where the above files were downloaded
ga.mat <- read.table("Grey's Anatomy.tsv", sep= "\t", header = TRUE, row.names = 1, quote = "\"")
ga.mat <- as.matrix(ga.mat)
ga.atts <- read.table("Grey's Anatomy (1).tsv", sep="\t", header=T, quote="\"", stringsAsFactors = FALSE, strip.white = TRUE, as.is = TRUE)
ga.net <- network(ga.mat, vertex.attr = ga.atts, vertex.attrnames = colnames(ga.atts), directed=F, hyper=F, loops=F, multiple=F, bipartite=F)
Plot the Data
Verify everything is properly read
Roughly understand the network
vcol = c("blue", "pink")[1 + (get.vertex.attribute(ga.net, "sex") == "F")]
plot(ga.net, vertex.col = vcol, label = get.vertex.attribute(ga.net, "name"), label.cex = 0.75)
Hypotheses?
Documentation
# Conduct a fuzzy search through the documentation for the term "ergm"
??"ergm"
# Read the documentation for the function 'ergm()'
?ergm
# Read the documentation for ergm terms in the ergm package
?ergm::ergm.terms
Let's take a few minutes to review the documentation
Our Baseline Model
Two Most Salient Network Characteristics
- Number of edges (i.e., density effect)
- Actor's sex (i.e., heterophily)
ga.base <- ergm(ga.net ~ edges + nodematch("sex")) # Estimate the model
summary(ga.base) # Summarize the model
==========================
Summary of model fit
==========================
Formula: ga.net ~ edges + nodematch("sex")
Iterations: 20
Monte Carlo MLE Results:
Estimate Std. Error MCMC % p-value
edges -2.3003 0.1581 NA <1e-04 ***
nodematch.sex -3.1399 0.7260 NA <1e-04 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Null Deviance: 1311.4 on 946 degrees of freedom
Residual Deviance: 320.5 on 944 degrees of freedom
AIC: 324.5 BIC: 334.2 (Smaller is better.)
Interpretation
==========================
Summary of model fit
==========================
Formula: ga.net ~ edges + nodematch("sex")
Iterations: 20
Monte Carlo MLE Results:
Estimate Std. Error MCMC % p-value
edges -2.3003 0.1581 NA <1e-04 ***
nodematch.sex -3.1399 0.7260 NA <1e-04 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Null Deviance: 1311.4 on 946 degrees of freedom
Residual Deviance: 320.5 on 944 degrees of freedom
AIC: 324.5 BIC: 334.2 (Smaller is better.)
- Coefficients are conditional log-odds
- Log-odds of a tie forming between two actors
-
I.e., edges + nodematch
- Probability
- exp(edges + nodematch) / (1 + exp(edges + nodematch))
Interpretation
==========================
Summary of model fit
==========================
Formula: ga.net ~ edges + nodematch("sex")
Iterations: 20
Monte Carlo MLE Results:
Estimate Std. Error MCMC % p-value
edges -2.3003 0.1581 NA <1e-04 ***
nodematch.sex -3.1399 0.7260 NA <1e-04 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Null Deviance: 1311.4 on 946 degrees of freedom
Residual Deviance: 320.5 on 944 degrees of freedom
AIC: 324.5 BIC: 334.2 (Smaller is better.)
- Probability
- exp(edges + nodematch) / (1 + exp(edges + nodematch))
- Opposite-sex = exp(-2.3 + -3.1 * 0) / (1 + exp(-2.3 + -3.1 * 0))
- Same-sex = exp(-2.3 + -3.1 * 1) / (1 + exp(-2.3 + -3.1 * 1))
- Correspond to 0.09 and 0.004
Interpretation
==========================
Summary of model fit
==========================
Formula: ga.net ~ edges + nodematch("sex")
Iterations: 20
Monte Carlo MLE Results:
Estimate Std. Error MCMC % p-value
edges -2.3003 0.1581 NA <1e-04 ***
nodematch.sex -3.1399 0.7260 NA <1e-04 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Null Deviance: 1311.4 on 946 degrees of freedom
Residual Deviance: 320.5 on 944 degrees of freedom
AIC: 324.5 BIC: 334.2 (Smaller is better.)
- Standard errors can be used for estimate confidence
- p-values for statistical significance
- Use either AIC or BIC to gauge model parsimony
- Principle of Occam's Razor
- Better fit with fewer parameters is preferable
Plot a Simulated Model
Does it look anything like our original network?
Is it degenerate?
plot(simulate(ga.base), vertex.col=vcol)
How is it different?
Is it degenerate?
Goodness of Fit
- Simulate the network many times
- Assess with one or more unmodeled network statistic
-
Compare the observed network statistics to the simulated
ga.base.gof <- gof(ga.base) # Simulate and measure. May take a minute or so.
summary(ga.base.gof) # Print a summary
par(mfrow=c(3, 1)) # Set up the plotting window
plot(ga.base.gof) # Plot the summary
Additional Parameter
- Keep edges and nodematch(sex)
- Theoretically motivated and significant
- Account for vertices with a degree of 1
-
Theoretically meaningful and underrepresented
ga.base.d1 <- ergm(ga.net ~ edges + nodematch("sex") + degree(1))
summary(ga.base.d1)
==========================
Summary of model fit
==========================
Formula: ga.net ~ edges + nodematch("sex") + degree(1)
Iterations: 20
Monte Carlo MLE Results:
Estimate Std. Error MCMC % p-value
edges -1.4292 0.2466 0 <1e-04 ***
nodematch.sex -3.1766 0.7332 0 <1e-04 ***
degree1 2.0807 0.4602 0 <1e-04 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Null Deviance: 1311.4 on 946 degrees of freedom
Residual Deviance: 295.8 on 943 degrees of freedom
AIC: 301.8 BIC: 316.3 (Smaller is better.)
Additional Parameter
ga.base.d1 <- ergm(ga.net ~ edges + nodematch("sex") + degree(1))
summary(ga.base.d1)
==========================
Summary of model fit
==========================
Formula: ga.net ~ edges + nodematch("sex") + degree(1)
Iterations: 20
Monte Carlo MLE Results:
Estimate Std. Error MCMC % p-value
edges -1.4292 0.2466 0 <1e-04 ***
nodematch.sex -3.1766 0.7332 0 <1e-04 ***
degree1 2.0807 0.4602 0 <1e-04 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Null Deviance: 1311.4 on 946 degrees of freedom
Residual Deviance: 295.8 on 943 degrees of freedom
AIC: 301.8 BIC: 316.3 (Smaller is better.)
-
Iteration 1 of at most 20:
-
Convergence test P-value: 0e+00
-
The log-likelihood improved by 0.6971
-
Iteration 2 of at most 20: ...
Additional Parameter
ga.base.d1 <- ergm(ga.net ~ edges + nodematch("sex") + degree(1))
summary(ga.base.d1)
==========================
Summary of model fit
==========================
Formula: ga.net ~ edges + nodematch("sex") + degree(1)
Iterations: 20
Monte Carlo MLE Results:
Estimate Std. Error MCMC % p-value
edges -1.4292 0.2466 0 <1e-04 ***
nodematch.sex -3.1766 0.7332 0 <1e-04 ***
degree1 2.0807 0.4602 0 <1e-04 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Null Deviance: 1311.4 on 946 degrees of freedom
Residual Deviance: 295.8 on 943 degrees of freedom
AIC: 301.8 BIC: 316.3 (Smaller is better.)
- Previous effects still significant
- Monogamy effect is positive and significant
- Improved AIC and BIC
Assess Fit
plot(simulate(ga.base.d1), vertex.col = vcol)
Assess Fit
plot(simulate(ga.base.d1), vertex.col = vcol)
ga.base.d1.gof <- gof(ga.base.d1)
summary(ga.base.d1.gof)
par(mfrow=c(3,1))
plot(ga.base.d1.gof)
Markov Chain Monte Carlo
Did it work properly?
mcmc.diagnostics(ga.base.d1)
# ...
Are sample statistics significantly different from observed?
edges nodematch.sex degree1 Overall (Chi^2)
diff. 0.3728000 0.0201000 -0.1174000 NA
test stat. 1.0569427 0.7868791 -0.7314219 1.563627
P-val. 0.2905377 0.4313526 0.4645215 0.667665
That looks okay, moving on...
Markov Chain Monte Carlo
Did it work properly?
Chain 1
edges nodematch.sex degree1
Lag 0 1.0000000 1.00000000 1.0000000
Lag 100 0.8609309 0.46789330 0.7446234
Lag 200 0.7542179 0.25234066 0.6214649
Lag 300 0.6619003 0.15270586 0.5286271
Lag 400 0.5806592 0.08059618 0.4508049
Lag 500 0.5136180 0.04619048 0.3917378
Sample statistics burn-in diagnostic (Geweke):
Chain 1
Fraction in 1st window = 0.1
Fraction in 2nd window = 0.5
edges nodematch.sex degree1
-0.6463 -1.4147 0.8218
P-values (lower = worse):
edges nodematch.sex degree1
0.5180537 0.1571526 0.4111837
- High autocorrelation across the chains
- Geweke roughly corresponds to Z-statistic
- Nodematch's p-value is a bit low.
Markov Chain Monte Carlo
Troubleshooting
- Sample statistics differ from observed
- MCMC's default sample size is 10,000
- Increase it (maybe to ~20,000 or 50,000)
- (Unnecessary here)
- Autocorrelation
- MCMC.interval's default is 100
- Increase it (here, to ~5,000)
- Improve the Geweke statistics
- MCMC.burnin's default is 10,000
- Increase it (here, to ~50,000)
Though they improve the results and help models converge, these changes increase estimation time.
Markov Chain Monte Carlo
Documentation to fine tune
?control.ergm
In addition to adjusting
- MCMC.burnin
- MCMC.samplesize
- MCMC.interval
Consider setting "parallel = 4"
- ... or however many processors you'd like
- Increases speed
- Requires library(rlecuyer)
Markov Chain Monte Carlo
?control.ergm
# If we wanted to improve the sample statistics
# ga.base.d1 <- ergm(ga.net ~ edges + nodematch("sex") + degree(1), control = control.ergm(MCMC.burnin = 50000, MCMC.interval = 5000, MCMC.samplesize = 50000))
ga.base.d1 <- ergm(ga.net ~ edges + nodematch("sex") + degree(1), control = control.ergm(MCMC.burnin = 50000, MCMC.interval = 5000))
summary(ga.base.d1)
==========================
Summary of model fit
==========================
Formula: ga.net ~ edges + nodematch("sex") + degree(1)
Iterations: 20
Monte Carlo MLE Results:
Estimate Std. Error MCMC % p-value
edges -1.4298 0.2483 0 <1e-04 ***
nodematch.sex -3.1697 0.7257 0 <1e-04 ***
degree1 2.0794 0.4587 0 <1e-04 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Null Deviance: 1311.4 on 946 degrees of freedom
Residual Deviance: 295.7 on 943 degrees of freedom
Markov Chain Monte Carlo
mcmc.diagnostics(ga.base.d1)
# ...
Are sample statistics significantly different from observed?
edges nodematch.sex degree1 Overall (Chi^2)
diff. -0.00150000 0.0174000 0 NA
test stat. -0.01690472 1.1955138 0 1.6043633
P-val. 0.98651262 0.2318864 1 0.6584009
Sample statistics auto-correlation:
Chain 1
edges nodematch.sex degree1
Lag 0 1.000000000 1.000000000 1.000000000
Lag 5000 0.015469475 0.005331195 0.006627886
Lag 10000 -0.000798288 -0.004256179 -0.004036665
Lag 15000 -0.012238958 -0.002511807 -0.008747576
Lag 20000 -0.001466805 0.004430013 0.001075269
Lag 25000 -0.026182557 -0.010818715 -0.027159351
Sample statistics burn-in diagnostic (Geweke):
# ...
edges nodematch.sex degree1
1.0343 0.5097 -1.0479
P-values (lower = worse):
edges nodematch.sex degree1
0.3010182 0.6102707 0.2946793
Alternative Theories
We now have a decent baseline model
- Approximates degree distribution
- Approximates geodesic distribution
Which theories should we now test?
-
Homophily
- Attributes: age, race, position
- Forms
- Continuous vs. categorical
- General, differential, mixture
- Attribute-based popularity
- Could challenge homophily effects
- Attributes: age, race, position, sex
Age Homophily
Continuous measure, use 'absdiff()' term
ga.base.d1.age <- ergm(ga.net ~ edges + nodematch("sex") + degree(1) + absdiff("birthyear"), control=control.ergm(MCMC.burnin=50000, MCMC.interval=5000))
summary(ga.base.d1.age)
==========================
Summary of model fit
==========================
Formula: ga.net ~ edges + nodematch("sex") + degree(1) + absdiff("birthyear")
Iterations: 20
Monte Carlo MLE Results:
Estimate Std. Error MCMC % p-value
edges -0.46543 0.28588 0 0.104
nodematch.sex -3.23784 0.73100 0 <1e-04 ***
degree1 1.99880 0.44231 0 <1e-04 ***
absdiff.birthyear -0.12465 0.02929 0 <1e-04 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Null Deviance: 1311.4 on 946 degrees of freedom
Residual Deviance: 270.3 on 942 degrees of freedom
AIC: 278.3 BIC: 297.7 (Smaller is better.)
Age Homophily
==========================
Summary of model fit
==========================
Formula: ga.net ~ edges + nodematch("sex") + degree(1) + absdiff("birthyear")
Iterations: 20
Monte Carlo MLE Results:
Estimate Std. Error MCMC % p-value
edges -0.46543 0.28588 0 0.104
nodematch.sex -3.23784 0.73100 0 <1e-04 ***
degree1 1.99880 0.44231 0 <1e-04 ***
absdiff.birthyear -0.12465 0.02929 0 <1e-04 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Null Deviance: 1311.4 on 946 degrees of freedom
Residual Deviance: 270.3 on 942 degrees of freedom
AIC: 278.3 BIC: 297.7 (Smaller is better.)
Interpretation- Effects comparable to before
- Age-based homophily present
- Improved AIC and BIC
Age Homophily
Whenever using the MCMC, check its diagnostics
mcmc.diagnostics(ga.base.d1.age)
If not...
- Which diagnostic looks wrong?
- How would we adjust the MCMC to correct it?
For sake of time, we'll pretend it's all okay.
Challenge to Age Homophily Hypothesis
What if younger characters/actors form more ties?
- Reasons
- Physiological
- Audience interest
- Cultural attitudes
- Implications if true?
- High probability of within the youth (homophily)
- If most of the cast is young, homophily more apparent
- Low probability of ties within the elderly
- Medium probability of ties between youth and elderly
Challenge to Age Homophily Hypothesis
ga.base.d1.age.covage <- ergm(ga.net ~ edges + nodematch("sex") + degree(1) + absdiff("birthyear") + nodecov("birthyear"), control=control.ergm(MCMC.burnin=50000, MCMC.interval=5000))
summary(ga.base.d1.age.covage)
==========================
Summary of model fit
==========================
Monte Carlo MLE Results:
Estimate Std. Error MCMC % p-value
edges 21.068917 50.797099 0 0.678
nodematch.sex -3.239118 0.740633 0 <1e-04 ***
degree1 1.984977 0.443934 0 <1e-04 ***
absdiff.birthyear -0.125028 0.029186 0 <1e-04 ***
nodecov.birthyear -0.005462 0.012884 0 0.672
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Null Deviance: 1311.4 on 946 degrees of freedom
Residual Deviance: 270.1 on 941 degrees of freedom
AIC: 280.1 BIC: 304.4 (Smaller is better.)
Challenge to Age Homophily Hypothesis
==========================
Summary of model fit
==========================
Monte Carlo MLE Results:
Estimate Std. Error MCMC % p-value
edges 21.068917 50.797099 0 0.678
nodematch.sex -3.239118 0.740633 0 <1e-04 ***
degree1 1.984977 0.443934 0 <1e-04 ***
absdiff.birthyear -0.125028 0.029186 0 <1e-04 ***
nodecov.birthyear -0.005462 0.012884 0 0.672
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Null Deviance: 1311.4 on 946 degrees of freedom
Residual Deviance: 270.1 on 941 degrees of freedom
AIC: 280.1 BIC: 304.4 (Smaller is better.)
Youth are not more popular
- Effect not significant
- Worse AIC, BIC
Race-based Homophily
Categorical attribute
ga.base.d1.age.race <- ergm(ga.net~edges + nodematch("sex") + degree(1) + absdiff("birthyear") + nodematch("race"), control = control.ergm(MCMC.burnin = 50000, MCMC.interval = 5000, parallel = 4))
mcmc.diagnostics(ga.base.d1.age.race) # How do the diagnostics look?
ga.base.d1.age.race.gof <- gof(ga.base.d1.age.race)
par(mfrow=c(3, 1)) # Set up the plotting window
plot(ga.base.d1.age.race.gof)
Race-based Homophily
Categorical attribute
ga.base.d1.age.race <- ergm(ga.net~edges + nodematch("sex") + degree(1) + absdiff("birthyear") + nodematch("race"), control = control.ergm(MCMC.burnin = 50000, MCMC.interval = 5000, parallel = 4))
mcmc.diagnostics(ga.base.d1.age.race) # How do the diagnostics look?
ga.base.d1.age.race.gof <- gof(ga.base.d1.age.race)
par(mfrow=c(3, 1)) # Set up the plotting window
plot(ga.base.d1.age.race.gof)
plot(simulate(ga.base.d1.age.race), vertex.col=vcol) # Plot a simulated network
Race-based Homophily
summary(ga.base.d1.age.race)
==========================
Summary of model fit
==========================
Monte Carlo MLE Results:
Estimate Std. Error MCMC % p-value
edges -0.98160 0.40479 0 0.0155 *
nodematch.sex -3.28508 0.72837 0 <1e-04 ***
degree1 1.95555 0.44480 0 <1e-04 ***
absdiff.birthyear -0.13297 0.03116 0 <1e-04 ***
nodematch.race 0.82943 0.36020 0 0.0215 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Null Deviance: 1311.4 on 946 degrees of freedom
Residual Deviance: 263.9 on 941 degrees of freedom
AIC: 273.9 BIC: 298.1 (Smaller is better.)
-
Effects comparable to before
-
Race-based homophily effect significant
-
Better AIC, worse BIC
Race-Based Differential Homophily
-
Relax the homogeneity assumption
- Possibility for homophily to differ between groups
ga.base.d1.age.racediff <- ergm(ga.net~edges + nodematch("sex")+ degree(1) + absdiff("birthyear") + nodematch("race", diff = TRUE, keep = c(1, 3)), control = control.ergm(MCMC.burnin=50000, MCMC.interval=5000, parallel = 4))
mcmc.diagnostics(ga.base.d1.age.racediff) # Some issues here. We'll ignore solving them for now.
summary(ga.base.d1.age.racediff)
==========================
Summary of model fit
==========================
Monte Carlo MLE Results:
Estimate Std. Error MCMC % p-value
edges -0.97137 0.38115 0 0.0110 *
nodematch.sex -3.32002 0.73217 0 <1e-04 ***
degree1 1.97079 0.43896 0 <1e-04 ***
absdiff.birthyear -0.13112 0.03027 0 <1e-04 ***
nodematch.race.Black 2.26102 0.74285 0 0.0024 **
nodematch.race.White 0.75956 0.35106 0 0.0307 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Null Deviance: 1311.4 on 946 degrees of freedom
Residual Deviance: 259.4 on 940 degrees of freedom
AIC: 271.4 BIC: 300.5 (Smaller is better.)
Race-Based Differential Homophily
==========================
Summary of model fit
==========================
Monte Carlo MLE Results:
Estimate Std. Error MCMC % p-value
edges -0.97137 0.38115 0 0.0110 *
nodematch.sex -3.32002 0.73217 0 <1e-04 ***
degree1 1.97079 0.43896 0 <1e-04 ***
absdiff.birthyear -0.13112 0.03027 0 <1e-04 ***
nodematch.race.Black 2.26102 0.74285 0 0.0024 **
nodematch.race.White 0.75956 0.35106 0 0.0307 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Null Deviance: 1311.4 on 946 degrees of freedom
Residual Deviance: 259.4 on 940 degrees of freedom
AIC: 271.4 BIC: 300.5 (Smaller is better.)
Implications
- Effects comparable to before
- Race-based homophily significant
- Homophily stronger among Black characters than White
- 0.78 vs. 0.45 probability
- Better AIC, worse BIC
Other Models
For example
# Are men or women more popular?
ga.base.d1.age.sex <- ergm(ga.net ~ edges + nodematch("sex") + degree(1) + absdiff("birthyear") + nodefactor("sex"), control = control.ergm(MCMC.burnin = 100000, MCMC.interval = 5000))
# Are certain occupational pairings more likely than others?
ga.base.d1.age.rolemix<-ergm(ga.net ~ edges + nodematch("sex") + degree(1) + absdiff("birthyear") + nodemix("position", base = c(3:5, 9, 12:15, 17:21, 24, 27, 28)), control = control.ergm(MCMC.burnin = 50000, MCMC.interval = 5000))
Estimating Network Properties from Ego Network Data
Smith. 2012. "Macrostructure from Microstructure" Sociological Methodology.
- Uses the "target.stats" term in ergm and stergm
- Simulates network based upon ego network properties, e.g.,
- Degree distribution
- Degree correlates and demographic properties
- Homophily
- Transitivity
- Can provide confidence intervals on network measurements, e.g.,
- Average path length
- Number of components
- Cohesive subgroups
- etc
General Modeling Advice
Start very simple
Select the best models based upon the BIC
Does the model reflect reality?
-
Plot simulations for quick diagnosis
-
Compare simulated measurements to the observed
Check MCMC diagnostics when MCMC called
Extensions
- Directed networks (same, yet some different terms)
- Weighted networks
- Bipartite networks (same, yet some different terms)
- ...
- Temporal networks
Separable Temporal Exponential Random Graph Models
(Hanneke and Xing 2007, Krivitsky and Handcock 2014)
- Network model at time t can be represented as an ERGM
- It is conditioned upon the network at time t - 1
- Discrete time
- The difference between two time points represents two networks
- Tie formation network (Y+)
- Tie dissolution network (Y-)
- Separablility: if tie formation is independent of tie dissolution
- Two separate ERGM formulas for each network
Theoretical Motivation
Does the process of tie dissolution differ from tie formation? If so, what are their respective processes?
Important to Bear in Mind...
The "tie dissolution" mode refers to tie persistence.
Example Data
University Student Network
"Over the past two weeks, who in our class did you communicate with outside of school?"
- Universe
- Second-year undergraduates
- Honors sociological methods class
- Classroom size of 22 students
- Asked at three time points ~3 weeks apart
- Symmetrized
- Actor attributes
- Study group
- Sex
Read the Data
stergm can take three types of data
- network object
- networkDynamic object
- network.list object
- List of network objects
library(tergm)
?stergm
?tergm::ergm.terms
?control.stergm
student.nets <- dget("http://pastebin.com/raw.php?i=f0s1DaD9")
summary(student.nets[[1]])
plot(student.nets[[1]], vertex.col = student.nets[[1]] %v% "studygroupcol", label = student.nets[[1]] %v% "vertex.names")
plot(student.nets[[2]], vertex.col = student.nets[[2]] %v% "studygroupcol", label = student.nets[[2]] %v% "vertex.names")
plot(student.nets[[3]], vertex.col = student.nets[[3]] %v% "studygroupcol", label = student.nets[[3]] %v% "vertex.names")
t1
t2
t3
Homophily Model
studmod1 <- stergm(student.nets, formation = ~edges + nodematch("studygroup"), dissolution = ~edges + nodematch("studygroup"), estimate = "CMLE", times = c(1:3))
summary(studmod1)
==============================
Summary of formation model fit
==============================
Monte Carlo MLE Results:
Estimate Std. Error MCMC % p-value
edges -2.3821 0.1941 NA < 1e-04 ***
nodematch.studygroup 1.2120 0.4281 NA 0.00488 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Null Deviance: 528.2 on 381 degrees of freedom
Residual Deviance: 240.4 on 379 degrees of freedom
AIC: 244.4 BIC: 252.3 (Smaller is better.)
================================
Summary of dissolution model fit
================================
Monte Carlo MLE Results:
Estimate Std. Error MCMC % p-value
edges 0.2412 0.4029 NA 0.5512
nodematch.studygroup 1.4118 0.5429 NA 0.0111 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Null Deviance: 112.29 on 81 degrees of freedom
Residual Deviance: 83.67 on 79 degrees of freedom
AIC: 87.67 BIC: 92.46 (Smaller is better.)
Homophily + Triangle Model
studmod2 <- stergm(student.nets, formation = ~edges + nodematch("studygroup") + triangle, dissolution = ~edges + nodematch("studygroup") + triangle, estimate = "CMLE", times = c(1:3)) mcmc.diagnostics(studmod2) # Some problems here.
studmod2 <- stergm(student.nets, formation = ~edges + nodematch("studygroup") + triangle, dissolution = ~edges + nodematch("studygroup") + triangle, estimate = "CMLE", times=c(1:3), control = control.stergm(CMLE.MCMC.interval = 500, CMLE.MCMC.burnin = 50000, parallel = 4))
mcmc.diagnostics(studmod2) # Not great on tie formation, but generally better. studmod2.gof <- gof(studmod2) par(mfrow = c(1, 3)) plot(studmod2.gof$formation) plot(studmod2.gof$dissolution)
Homophily + Triangle Tie Formation
Homophily + Triangle Tie Dissolution
Homophily + Triangle Model
Plot one time slice ahead of each wave
par(mfrow = c(1, 3))
plot(simulate.stergm(studmod2, nw.start = student.nets[[1]]), vertex.col = student.nets[[1]] %v% "studygroupcol", label = student.nets[[1]] %v% "vertex.names")
plot(simulate.stergm(studmod2, nw.start = student.nets[[2]]), vertex.col = student.nets[[2]] %v% "studygroupcol", label = student.nets[[2]] %v% "vertex.names")
plot(simulate.stergm(studmod2, nw.start = student.nets[[3]]), vertex.col = student.nets[[3]] %v% "studygroupcol", label = student.nets[[3]] %v% "vertex.names")
Homophily + Triangle Model
Plot three time slices ahead of the final wave
par(mfrow = c(1, 3))
studmod2.w3sim <- simulate.stergm(studmod2, nw.start = student.nets[[3]], time.slices = 1)
plot(studmod2.w3sim, vertex.col = student.nets[[3]] %v% "studygroupcol", label = student.nets[[3]] %v% "vertex.names")
studmod2.w3sim <- simulate.stergm(studmod2, nw.start = as.network(studmod2.w3sim), time.slices = 1)
plot(studmod2.w3sim, vertex.col = student.nets[[3]] %v% "studygroupcol", label = student.nets[[3]] %v% "vertex.names")
studmod2.w3sim <- simulate.stergm(studmod2, nw.start = as.network(studmod2.w3sim), time.slices = 1)
plot(studmod2.w3sim, vertex.col = student.nets[[3]] %v% "studygroupcol", label = student.nets[[3]] %v% "vertex.names")
t4, t5, t6 Simulated from t3
Homophily + Triangle Model
summary(studmod2)
==============================
Summary of formation model fit
==============================
Formula: ~edges + nodematch("studygroup") + triangle
Monte Carlo MLE Results:
Estimate Std. Error MCMC % p-value
edges -2.6000 0.2794 0 <1e-04 ***
nodematch.studygroup 1.0710 0.4228 0 0.0117 *
triangle 0.1801 0.1585 0 0.2566
---
Null Deviance: 528.2 on 381 degrees of freedom
Residual Deviance: 239.2 on 378 degrees of freedom
AIC: 245.2 BIC: 257.1 (Smaller is better.)
================================
Summary of dissolution model fit
================================
Formula: ~edges + nodematch("studygroup") + triangle
Monte Carlo MLE Results:
Estimate Std. Error MCMC % p-value
edges -0.2232 0.4231 0 0.5993
nodematch.studygroup 0.7851 0.5204 0 0.1355
triangle 1.0938 0.4417 0 0.0154 *
---
Null Deviance: 112.29 on 81 degrees of freedom
Residual Deviance: 77.18 on 78 degrees of freedom
AIC: 83.18 BIC: 90.36 (Smaller is better.)
Homophily + Triangle Model
Interpretation
- AIC, BIC
- Worse for formation
- Better for dissolution
- Homophily by study group
- Forms ties
- No effect on preserving ties
- Triangle
- No effect on tie formation
- Preserves ties
Goodness of Fit Troubles
- Overestimated 0 edgewise shared partners in formation
- Underestimated geodesics in dissolution
"Augmented" Model
Changes
- Formation model
- Drop triangle, add esp(0)
- Dissolution model
- Drop nodematch(), add twopath
studmod3 <- stergm(student.nets, formation = ~edges + nodematch("studygroup") + esp(0), dissolution = ~edges + triangle + twopath, estimate = "CMLE", times = c(1:3), control = control.stergm(CMLE.MCMC.interval = 500, CMLE.MCMC.burnin = 50000, parallel = 4))
mcmc.diagnostics(studmod3) # Looks okay!
par(mfrow = c(1, 3))
studmod3.w3sim <- simulate.stergm(studmod3, nw.start = student.nets[[3]], time.slices = 1)
plot(studmod3.w3sim, vertex.col = student.nets[[3]] %v% "studygroupcol", label = student.nets[[3]] %v% "vertex.names")
studmod3.w3sim <- simulate.stergm(studmod3, nw.start = as.network(studmod3.w3sim), time.slices = 1)
plot(studmod3.w3sim, vertex.col = student.nets[[3]] %v% "studygroupcol", label = student.nets[[3]] %v% "vertex.names")
studmod3.w3sim <- simulate.stergm(studmod3, nw.start = as.network(studmod3.w3sim), time.slices = 1)
plot(studmod3.w3sim, vertex.col = student.nets[[3]] %v% "studygroupcol", label = student.nets[[3]] %v% "vertex.names")
t4, t5, t6 Simulated from t3
"Augmented" Model
studmod3.gof <- gof(studmod3)
par(mfrow = c(1, 3))
plot(studmod3.gof$formation)
plot(studmod3.gof$dissolution)
"Augmented" Model Formation
"Augmented" Model Dissolution
"Augmented" Model Summary
summary(studmod3)
==============================
Summary of formation model fit
==============================
Formula: ~edges + nodematch("studygroup") + esp(0)
Monte Carlo MLE Results:
Estimate Std. Error MCMC % p-value
edges -2.4521 0.1854 0 < 1e-04 ***
nodematch.studygroup 1.0978 0.4190 0 0.00915 **
esp0 -0.7903 0.3546 0 0.02641 *
---
Null Deviance: 528.2 on 381 degrees of freedom
Residual Deviance: 234.0 on 378 degrees of freedom
AIC: 240 BIC: 251.9 (Smaller is better.)
================================
Summary of dissolution model fit
================================
Formula: ~edges + triangle + twopath
Monte Carlo MLE Results:
Estimate Std. Error MCMC % p-value
edges 1.6963 0.7737 0 0.031326 *
triangle 1.7505 0.4486 0 0.000201 ***
twopath -0.3670 0.1376 0 0.009285 **
---
Null Deviance: 112.29 on 81 degrees of freedom
Residual Deviance: 69.35 on 78 degrees of freedom
AIC: 75.35 BIC: 82.53 (Smaller is better.)
"Augmented" Model Summary
Interpretation
- AIC and BIC improved
- Previous effects retained
- Tie formation avoids creating edges with 0 shared partners
- Two-paths are unlikely to last
Additional Capacities of STERGM
- Cross-sectional network modeling
- Tie formation, dissolution inferred from a single network
- Requires estimate of mean relationship perseverance
- Edges' term equals log(mean perseverance - 1)
- Estimation of network properties from ego network statistics
- See discussion earlier
- Requires estimation of mean relationship perseverance
Alternative Approaches to Longitudinal Networks
Siena
Snijders, van de Bunt, Steglich. 2010. Social Networks.
Software: RSiena
- Stochastic actor-based models
- Models how actors can change their outgoing ties
- Network panel data
- Closely in line with agent-based models
- Can model social behavior in addition to tie formation
Relational Event Models
Butts. 2008. Sociological Methodology.
Software: relevent
- Events occur in real time
- Events are of a certain type
- An event's occurrence can affect the likelihood of future events
- Increase or decrease
- Tie formation can be one such event
- Events can be transitive, accumulate, reciprocal, etc.
- Highly useful for communication networks (among others)
networkDynamic
Part of statnet
- Visualization of dynamic networks
- Storing fully dynamic network data
NetModelsWS20140627
By Benjamin Lind
NetModelsWS20140627
- 3,317