Multilevel Modeling, Part 2
PSY 383
-
Assessing the assumptions of multilevel models
- Why we care
- Graphical evaluation
- What to do when assumptions are violated
- Centering
- Longitudinal applications of multilevel models
Plan for this section
Assumptions
- Some are very similar to linear regression
- Assumption 1: Normality of residuals
- Assumption 2: Linearity of relationship between predictors and outcome
-
Some are a little different with multilevel models
- Assumption 3: Homoscedasticity of residuals across groups
Assumptions
- We assume that residuals are normally distributed.
- Note that this does not mean that the variables themselves are normally distributed!
- How serious is this assumption?
- Appears to be less important at larger sample sizes.
- Very closely related to our other assumptions.
Assumption 1
- We assume that the predictors have a linear relationship to the outcome.
- How serious is this assumption?
- Fairly serious, actually.
- Misspecifying the shape of the relationship gives biased coefficients and can lead to incorrect inferences.
- "Linear in the parameters" vs. nonlinear
Assumption 2

- We assume two things about the residual variance:
- A single residual variance \(\sigma^2\) applies to all groups (i.e., homoscedasticity)
- There are no residual covariances between groups
- e.g., math scores in School 1 and School 2 are unrelated to one another.
- How serious is this assumption?
- Potentially serious.
- Misspecifying the shape of the relationship gives biased coefficients and can lead to incorrect inferences.
Assumption 3
- These assumptions can actually be relaxed
-
Homoscedasticity
- Single variance \(\sigma^2\) can be extended to group-specific variance \(\sigma_j^2\)
-
No residual covariances
- We typically cannot (and do not want to) freely model all residual covariances
- But we can test specific error structures - e.g., allowing a single parameter to summarize the covariance between schools (compound symmetry)
- Won't go over this today, but this presentation is great
Assumption 3
- For homoscedasticity, we can test whether allowing group-specific variances improves model fit using likelihood ratio tests.
- For everything else, we have a lot of graphical tools at our disposal
- Lots of judgment calls to make, just like in linear regression
How do we test these?
Centering
- Let's say it right off the bat: centering is confusing.
- And there are lots of different opinions on best practices!
- It can allow us to do two important things:
- Disaggregate between-groups and within-group variance
- Put intercepts on an interpretable scale
Centering predictors
- One option: grand mean centering
- Example from before: Percent FRL across schools
- We denote the grand-mean centered percent FRL
Centering level-2 predictors
\(PctFRL_j = \) percent FRL in school \(j\)
\(\overline{PctFRL} = \frac{\sum_{j=1}^J{PctFRL_j}}{J}\)
\(= \) the unweighted mean of percent FRL across all schools
\(PctFRL_j - \overline{PctFRL}\)
Level 2
Level 1
Returning to the intercepts-as-outcomes model
By centering the predictor, we will change the interpretation of the intercept. Currently \(\gamma_{00}\) is interpreted as the predicted value for a school where 0% of kids are eligible for free or reduced lunch.
Level 2
Level 1
Returning to the intercepts-as-outcomes model
Now \(\gamma_{00}\) is interpreted as the predicted value for a school with a(n unweighted) average number of kids eligible for free or reduced lunch.
Centering Level-1 predictors
- Two options: grand mean centering or group mean centering
- Example from before: Hours of TV watched
- \(HoursTV_{ij} = \) Number of hours of TV child \(i\), attending school \(j\), watches
Group mean
Group mean-centered
Grand mean
Grand mean-centered at Level 1
The distance between subject \(i\) and the whole sample average
The distance between subject \(i\) and the average for their school
Level 2
Level 1
Returning to the intercepts-as-outcomes model
Now \(\beta_{0j}\) is interpreted as the predicted value of math score for a child who watches the average number of hours of TV, conditional on percent free or reduced lunch.
Group mean
Group mean-centered
Grand mean
Grand mean-centered at Level 1
The distance between subject \(i\) and the whole sample average
The distance between subject \(i\) and the average for their school
Level 1
The average number of hours of TV watched by students in school \(j\)
Level 2
Group mean
Level 2
Level 1
Returning to the intercepts-as-outcomes model
Now \(\beta_{0j}\) is interpreted as the predicted value of math score for a child who watches the average amount of TV among children at their school.
...but it is a confounded estimate, in the sense that it also potentially contains between-school differences.
Level 2
Level 1
Returning to the intercepts-as-outcomes model
Now \(\beta_{0j}\) is interpreted as the predicted value of math score for a child who watches the average amount of TV among children at their school, \(\beta_{1j}\) conveys the effect of a student's difference from this average. Similarly, \(\gamma_{02}\) conveys the effect of a school's average TV-watching.
Multilevel models for longitudinal data
- Here, we consider a single person as a "group", in the sense that all time points are nested within a given person.
- We will use time as a Level-1 predictor.
- The models we're going over can be considered a special case of structural equation models.
- One piece of advice: don't get too hung up on which piece of MLM jargon (e.g., intercepts-as-outcomes, slopes-as-outcomes) each model maps onto.
Longitudinal data
- Nationally representative survey of \(N=6504\) adolescents
- Each followed up for a maximum of four time points
- A fifth wave was just collected!
- Ages at the first time point ranged from 13 to 21.
- Ages at the fourth time point ranged from 25 to 34.
- Here, we predict drinking score, a composite score of three drinking-related indicators, from adolescence through early adulthood.
Motivating Example: Add Health
Level 2
Level 1
Random-intercept model
Here, \(Age_{ij}\) and \(Drinking_{ij}\) are the age and drinking score, respectively, of subject \(j\) at time \(i\), and Note that under this formulation, only the intercept can vary by person.
Level 2
Level 1
Random-slopes model
Now, under this formulation, we can have variation in the slopes by person.

- For linear growth, we can just put age in there, or we can alter it any number of ways:
- We can subtract out the first time
- We can rescale it by some multiple (e.g., 10)
- Sometimes this helps with convergence
- We can also enter a quadratic component of time - i.e., \(age^2\).
- Or cubic (\(age^3\)), quartic (\(age^4\)), not sure if quintic (\(age^5\)) is a word but let's go with it...
- Note that if you enter in a polynomial, you must include all lower-order polynomials.
How does time enter the model?
Level 2
Level 1
Adding a quadratic component
This can help us to model change that increases and subsequently decreases or levels off. Note that we could include a random effect for that quadratic component too.

-
Level 2: We can add person-level predictors by allowing the intercept, slope, or both to vary for different people.
- Again, this is the same as the intercepts-as-outcomes and slopes-as-outcomes model, but don't worry too much about the terminology.
-
Level 1: We can add a time-level predictor, but note that the interpretation of these coefficients can be challenging.
- If a variable is contemporaneous with the outcome, it can be challenging to make causal statements.
Predictors
Level 2
Level 1
Adding predictors
Now we have the intercept of drinking being allowed to differ between males and females. We could also allow the slopes to differ.
- Note that, if time points are sufficiently close together, independence of errors may not be a reasonable assumption.
- For instance, if you are doing an EMA
- Modeling complex error structures may be possible!
- Always fit the unconditional model first to figure out the general shape of change.
A few more things about longitudinal models....
Thank you!
colev@wfu.edu
Copy of An introduction to multilevel modeling for developmental science: Longitudinal applications
By Veronica Cole
Copy of An introduction to multilevel modeling for developmental science: Longitudinal applications
- 213