Multilevel Modeling, Part 2

PSY 716

  • Assessing the assumptions of multilevel models
    • ​Why we care
    • Graphical evaluation
    • What to do when assumptions are violated
  • Centering
  • Longitudinal applications of multilevel models

Plan for this section

Assumptions

  • Some are very similar to linear regression
    • Assumption 1: Normality of residuals
    • Assumption 2: Linearity of relationship between predictors and outcome
  • Some are a little different with multilevel models
    • Assumption 3: Homoscedasticity of residuals across groups

 

Assumptions

  • We assume that residuals are normally distributed. 
  • Note that this does not mean that the variables themselves are normally distributed!
  • How serious is this assumption?
    • Appears to be less important at larger sample sizes.
    • Very closely related to our other assumptions.

Assumption 1

  • We assume that the predictors have a linear relationship to the outcome.
  • How serious is this assumption?
    • Fairly serious, actually.
    • Misspecifying the shape of the relationship gives biased coefficients and can lead to incorrect inferences.
  • "Linear in the parameters" vs. nonlinear 

Assumption 2

  • We assume two things about the residual variance:
    • A single residual variance \(\sigma^2\) applies to all groups (i.e., homoscedasticity)
    • There are no residual covariances between groups
      • e.g., math scores in School 1 and School 2 are unrelated to one another.
  • How serious is this assumption?
    • Potentially serious.
    • Misspecifying the shape of the relationship gives biased coefficients and can lead to incorrect inferences.

Assumption 3

  • These assumptions can actually be relaxed
  • Homoscedasticity
    • Single variance \(\sigma^2\) can be extended to group-specific variance \(\sigma_j^2\)
  • No residual covariances
    • We typically cannot (and do not want to) freely model all residual covariances
    • But we can test specific error structures - e.g., allowing a single parameter to summarize the covariance between schools (compound symmetry)
    • Won't go over this today, but this presentation is great

Assumption 3

  • For homoscedasticity, we can test whether allowing group-specific variances improves model fit using likelihood ratio tests.
  • For everything else, we have a lot of graphical tools at our disposal
  • Lots of judgment calls to make, just like in linear regression

How do we test these?

Centering

  • Let's say it right off the bat: centering is confusing.
    • And there are lots of different opinions on best practices!
  • It can allow us to do two important things:
    • Disaggregate between-groups and within-group variance
    • Put intercepts on an interpretable scale

Centering predictors

  • One option: grand mean centering
  • Example from before: Percent FRL across schools

 

 

 

 

 

 

  • We denote the grand-mean centered percent FRL

Centering level-2 predictors

\(PctFRL_j = \) percent FRL in school \(j\)

\(\overline{PctFRL} = \frac{\sum_{j=1}^J{PctFRL_j}}{J}\)

\(= \) the unweighted mean of percent FRL across all schools

\(PctFRL_j - \overline{PctFRL}\)

MathScore_{ij} = \beta_{0j} + \beta_{1j}HoursTV_{ij} + r_{ij}
\beta_{0j} = \gamma_{00} + \gamma_{01}PctFRL_j + u_{0j}
\beta_{1j} = \gamma_{10}
u_{0j}\sim N\left(0,\tau_{00}\right)
r_{ij}\sim N\left(0,\sigma^2\right)

Level 2

Level 1

Returning to the intercepts-as-outcomes model

By centering the predictor, we will change the interpretation of the intercept. Currently \(\gamma_{00}\) is interpreted as the predicted value for a school where 0% of kids are eligible for free or reduced lunch.

MathScore_{ij} = \beta_{0j} + \beta_{1j}HoursTV_{ij} + r_{ij}
\beta_{0j} = \gamma_{00} + \gamma_{01}\left(PctFRL_j - \overline{PctFRL}\right) + u_{0j}
\beta_{1j} = \gamma_{10}
u_{0j}\sim N\left(0,\tau_{00}\right)
r_{ij}\sim N\left(0,\sigma^2\right)

Level 2

Level 1

Returning to the intercepts-as-outcomes model

Now \(\gamma_{00}\) is interpreted as the predicted value for a school with a(n unweighted) average number of kids eligible for free or reduced lunch.

Centering Level-1 predictors

  • Two options: grand mean centeringor group mean centering
  • Example from before: Hours of TV watched
    • \(HoursTV_{ij} = \) Number of hours of TV child \(i\), attending school \(j\), watches
\overline{HoursTV}_{\bullet j} = \frac{\sum_{i=1}^{N_j} HoursTV_{ij}}{N_j}
HoursTV_{ij} - \overline{HoursTV}_{\bullet j}

Group mean

Group mean-centered

\overline{HoursTV} = \frac{\sum_{j=1}^J \sum_{i=1}^{N_j} HoursTV_{ij}}{N}
HoursTV_{ij} - \overline{HoursTV}

Grand mean

Grand mean-centered at Level 1

The distance between subject \(i\) and the whole sample average

The distance between subject \(i\) and the average for their school

MathScore_{ij} = \beta_{0j} + \beta_{1j}\left(HoursTV_{ij}-\overline{HoursTV}\right) + r_{ij}
\beta_{0j} = \gamma_{00} + \gamma_{01}PctFRL_j+ u_{0j}
\beta_{1j} = \gamma_{10}
u_{0j}\sim N\left(0,\tau_{00}\right)
r_{ij}\sim N\left(0,\sigma^2\right)

Level 2

Level 1

Returning to the intercepts-as-outcomes model

Now \(\beta_{0j}\) is interpreted as the predicted value of math score for a child who watches the average number of hours of TV, conditional on percent free or reduced lunch.

\overline{HoursTV}_{\bullet j} = \frac{\sum_{i=1}^{N_j} HoursTV_{ij}}{N_j}
HoursTV_{ij} - \overline{HoursTV}_{\bullet j}

Group mean

Group mean-centered

\overline{HoursTV} = \frac{\sum_{j=1}^J \sum_{i=1}^{N_j} HoursTV_{ij}}{N}
HoursTV_{ij} - \overline{HoursTV}

Grand mean

Grand mean-centered at Level 1

The distance between subject \(i\) and the whole sample average

The distance between subject \(i\) and the average for their school

Level 1

The average number of hours of TV watched by students in school \(j\)

Level 2

HoursTV_{ij} = \left(HoursTV_{ij}-\overline{HoursTV}_{\bullet j}\right) + \overline{HoursTV}_{\bullet j}
\overline{HoursTV}_{\bullet j} = \frac{\sum_{i=1}^{N_j} HoursTV_{ij}}{N_j}

Group mean

MathScore_{ij} = \beta_{0j} + \beta_{1j}\left(HoursTV_{ij}-\overline{HoursTV}_{\bullet j}\right) + r_{ij}
\beta_{0j} = \gamma_{00} + \gamma_{01}PctFRL_j+ u_{0j}
\beta_{1j} = \gamma_{10}
u_{0j}\sim N\left(0,\tau_{00}\right)
r_{ij}\sim N\left(0,\sigma^2\right)

Level 2

Level 1

Returning to the intercepts-as-outcomes model

Now \(\beta_{0j}\) is interpreted as the predicted value of math score for a child who watches the average amount of TV among children at their school.

...but it is a confounded estimate, in the sense that it also potentially contains between-school differences.

MathScore_{ij} = \beta_{0j} + \beta_{1j}\left(HoursTV_{ij}-\overline{HoursTV}_{\bullet j}\right) + r_{ij}
\beta_{0j} = \gamma_{00} + \gamma_{01}PctFRL_j+ \gamma_{02}\overline{HoursTV}_{\bullet j} + u_{0j}
\beta_{1j} = \gamma_{10}
u_{0j}\sim N\left(0,\tau_{00}\right)
r_{ij}\sim N\left(0,\sigma^2\right)

Level 2

Level 1

Returning to the intercepts-as-outcomes model

Now \(\beta_{0j}\) is interpreted as the predicted value of math score for a child who watches the average amount of TV among children at their school, \(\beta_{1j}\) conveys the effect of a student's difference from this average. Similarly, \(\gamma_{02}\) conveys the effect of a school's average TV-watching.

Special cases

The paired samples t-test examines the difference between paired observations:

$$t = \frac{\bar{d}}{\frac{s_d}{\sqrt{n}}}$$

Where:

  • \(d_i = y_{i1} - y_{i0}\) (difference between paired observations)
  • \(\bar{d}\) is the mean of differences
  • \(s_d\) is the standard deviation of differences
  • \(n\) is the number of pairs

Paired-samples t-tests as MLM's

We can reframe this as a multilevel model, with measurements nested within subjects:

Level 1

where:

  • \(y_{ij}\) is the outcome for subject \(j\) at measurement \(i\)
  • \(G_{ij}\) is the dummy-coded group variable (0 = condition 0, 1 = condition 1)
  • \(r_{ij}\) is the measurement-level residual
  • \(\gamma_{00}\) is the average outcome in condition 0
  • \(\gamma_{10}\) is the average difference between conditions
  • \(u_{0j}\) are subject-level random effects

$$\beta_{0j} = \gamma_{00} + u_{0j}$$

$$y_{ij} = \beta_{0j} + \beta_{1j}G_{ij} + r_{ij}$$

Level 2

$$\beta_{1j} = \gamma_{10}$$

  • \(\gamma_{10}\) is equivalent to \(\bar{d}\) in the paired t-test
  • Testing \(H_0: \gamma_{10} = 0\) is equivalent to the paired t-test
  • Critical note: standard paired-samples t-test does NOT include random slope
  • The multilevel approach allows for:
    • Missing data
    • Inclusion of covariates
    • More complex variance structures

Interpretation

Repeated measures ANOVA examines differences across multiple conditions within subjects:

$$ F = \frac{MS_{group}}{MS_{error}} = \frac{\frac{SS_{group}}{df_{group}}}{\frac{SS_{error}}{df_{error}}} $$

where:

  • \( SS_{group} \) is the sum of squares for group effect
  • \( SS_{error} \) is the sum of squares for error
  • \( df_{group} = k - 1 \) (\( k \) is the number of conditions)
  • \( df_{error} = (N-1)(k-1) \) (N is the number of subjects)

Key assumption: Sphericity (equal variances of differences between all pairs of conditions)

Repeated-measures ANOVA's as MLM's

We can reframe this as a multilevel model with measurements nested within subjects:

$$y_{ij} = \beta_{0j} + \sum_{m=1}^{k-1} \beta_{mj}G_{mij} + r_{ij}$$

$$\beta_{0j} = \gamma_{00} + u_{0j}$$

$$\beta_{mj} = \gamma_{m0}$$

where:

  • \( y_{ij} \) is the outcome for subject \( j \) at measurement occasion \( i \)
  • \( G_{mij} \) are dummy codes for conditions (reference is condition 0)
  • \( r_{ij} \) is the measurement-level residual with variance \( \sigma^2 \)
  • \( \gamma_{00} \) is the average outcome in the reference condition
  • \( \gamma_{m0} \) represents mean differences between each condition and reference
  • \( u_{0j } \) are subject-level random effects with variance \( \tau^2 \)

Level 1

Level 2

  • Testing omnibus hypothesis \( (H_0: \gamma_{10} = \gamma_{20} = ... = \gamma_{(k-1)0} = 0) \) is equivalent to repeated measures ANOVA F-test
  • The multilevel approach offers several advantages:
    • Handles missing data appropriately
    • Allows inclusion of subject-level and time-varying covariates
    • Can model more complex variance structures beyond sphericity
    • Permits random slopes (allowing treatment effects to vary by subject)
    • Accommodates continuous predictors and more complex designs
    • Can handle unbalanced designs and irregular measurement occasions

Note: The standard RM-ANOVA assumes compound symmetry and sphericity, while multilevel models can relax these assumptions by specifying different variance-covariance structures.

Interpretation

Multilevel models for longitudinal data

  • Here, we consider a single person as a "group", in the sense that all time points are nested within a given person. 
  • We will use time as a Level-1 predictor.
  • The models we're going over can be considered a special case of structural equation models.
  • One piece of advice: don't get too hung up on which piece of MLM jargon (e.g., intercepts-as-outcomes, slopes-as-outcomes) each model maps onto.

Longitudinal data

  • Nationally representative survey of \(N=6504\) adolescents 
  • Each followed up for a maximum of five time points
  • Ages at the first time point ranged from 13 to 21.
  • Ages at the fifth time point ranged from 35 to 42.
  • We are using data from the first four time points.
  • Here, we predict drinking score, a composite score of three drinking-related indicators, from adolescence through early adulthood.

Motivating Example: Add Health

Drinking_{ij} = \beta_{0j} + \beta_{1j}Age_{ij} + r_{ij}
\beta_{0j} = \gamma_{00} + u_{0j}
\beta_{1j} = \gamma_{10}
u_{0j}\sim N\left(0,\tau_{00}\right)
r_{ij}\sim N\left(0,\sigma^2\right)

Level 2

Level 1

Random-intercept model

Here, \(Age_{ij}\) and \(Drinking_{ij}\) are the age and drinking score, respectively, of subject \(j\) at time \(i\), and  Note that under this formulation, only the intercept can vary by person. 

Drinking_{ij} = \beta_{0j} + \beta_{1j}Age_{ij} + r_{ij}
\beta_{0j} = \gamma_{00} + u_{0j}
\beta_{1j} = \gamma_{10} + u_{1j}
\begin{bmatrix} u_{0j} \\ u_{1j} \end{bmatrix} \sim N \begin{bmatrix} \tau_{00} & \\ \tau_{01} & \tau_{11} \end{bmatrix}
r_{ij}\sim N\left(0,\sigma^2\right)

Level 2

Level 1

Random-slopes model

Now, under this formulation, we can have variation in the slopes by person.

  • For linear growth, we can just put age in there, or we can alter it any number of ways:
    • We can subtract out the first time
    • We can rescale it by some multiple (e.g., 10)
      • Sometimes this helps with convergence
  • We can also enter a quadratic component of time - i.e., \(age^2\).
    • Or cubic (\(age^3\)), quartic (\(age^4\)), not sure if quintic (\(age^5\)) is a word but let's go with it...
  • ​Note that if you enter in a polynomial, you must include all lower-order polynomials.

How does time enter the model?

Drinking_{ij} = \beta_{0j} + \beta_{1j}Age_{ij} + \beta_{2j}Age^2_{ij} + r_{ij}
\beta_{0j} = \gamma_{00} + u_{0j}
\beta_{1j} = \gamma_{10} + u_{1j}
\begin{bmatrix} u_{0j} \\ u_{1j} \end{bmatrix} \sim N \begin{bmatrix} \tau_{00} & \\ \tau_{01} & \tau_{11} \end{bmatrix}
r_{ij}\sim N\left(0,\sigma^2\right)

Level 2

Level 1

Adding a quadratic component

This can help us to model change that increases and subsequently decreases or levels off. Note that we could include a random effect for that quadratic component too.

\beta_{2j} = \gamma_{20}
  • Level 2: We can add person-level predictors by allowing the intercept, slope, or both to vary for different people.
    • Again, this is the same as the intercepts-as-outcomes and slopes-as-outcomes model, but don't worry too much about the terminology.
  • Level 1: We can add a time-level predictor, but note that the interpretation of these coefficients can be challenging.
    • If a variable is contemporaneous with the outcome, it can be challenging to make causal statements.

Predictors

Drinking_{ij} = \beta_{0j} + \beta_{1j}Age_{ij} + r_{ij}
\beta_{0j} = \gamma_{00} + \gamma_{01}Male_j + u_{0j}
\beta_{1j} = \gamma_{10} + u_{1j}
\begin{bmatrix} u_{0j} \\ u_{1j} \end{bmatrix} \sim N \begin{bmatrix} \tau_{00} & \\ \tau_{01} & \tau_{11} \end{bmatrix}
r_{ij}\sim N\left(0,\sigma^2\right)

Level 2

Level 1

Adding predictors

Now we have the intercept of drinking being allowed to differ between males and females. We could also allow the slopes to differ.

  • Note that, if time points are sufficiently close together, independence of errors may not be a reasonable assumption.
    • For instance, if you are doing an EMA
    • Modeling complex error structures may be possible!
  • Always fit the unconditional model first to figure out the general shape of change.

A few more things about longitudinal models....

Thank you!

colev@wfu.edu