Multilevel Modeling Part 1

PSY 383

Introduction: What are multilevel models and why would we use them?
Between vs. within-group variance
- Fixed vs. random effects
- Intra-class correlation
- Our first multilevel model: the random-effects ANOVA
Incorporating predictors
- Random intercept regression model
- Intercepts-as-outcomes
- Slopes-as-outcomes

Multilevel models

Motivation: nested data

We use multilevel models in the presence of a nested data structure.
- Multiple students in the same school
- Multiple children of the same mother
- Multiple time points observed on the same person

Why use multilevel models?

In the presence of nested data, the assumption of independence of errors is not met.
Multilevel models allow us to...
- ...model group-level effects, if they are of substantive interest
- ...determine the relative magnitude of within-group effects
- ...generalize to a population of group effects

Independence of Error Terms

In the presence of nested data, the assumption of independence of errors is not met.
This is key, because a standard linear model assumes that between-group differences are the only source of variation
If we ignore nesting of data, we obtain biased estimates of coefficients and standard errors
- Standard errors often downwardly biased, leading to inflated Type I error rate

Other options

Randomly sample one observation from each group
Fixed effects model
- i.e., including a regression coefficient for each grouping
Two approaches which only estimate marginal effects
- Generalized estimating equations
- Adjustments to standard errors which adjust for clustering
  - Huber-White SEs, so-called "sandwich" estimator

Why use multilevel models?

Multilevel models allow us to...
- ...model group-level effects, if they are of substantive interest
  - e.g., How does school climate affect math scale scores?
- ...determine the relative magnitude of within-group effects
  - e.g., How much do school-level factors affect math scale scores, relative to child-level scale scores?
- ...generalize to a population of group effects
  - e.g., How much do school-level factors affect math scale scores, relative to child-level scale scores, above and beyond the schools we are currently considering?

Why use multilevel models?

Multilevel models, at least theoretically, allow us to model the interplay between these levels.

A note on nomenclature

Multilevel models go by a number of different names:
- Random coefficient models
- Mixed models
- Hierarchical linear models
Additionally, the same model is often referred to in multiple different ways
- e.g., a slopes-as-outcomes model vs. a model with a cross-level interaction
- We will try to be as general as possible but if ever you are confused, please just ask!

Between-groups variance vs. within-groups variance

Partitioning variance

Our first task is to figure out how much variance in our outcome is related to differences between groups, and how much is related to variation within groups.
- How similar are kids who go to the same school in terms of math ability?
- How similar are a teenager's self-reported depressive symptoms from one day to the next?
- How similar are externalizing problems among children of the same parent?

Motivating example: ECLS-K

Nationally representative study children's cognitive and social development from kindergarten to eighth grade
\(N=15305\) children sampled starting in 1998
Here we look at math skills in a cross-section of students in third grade
Children nested within school

Motivating example: ECLS-K

Questions we wish to answer
- To what extent do differences between children in math ability owe to differences between schools?
- Which child-level factors are associated with higher math scores?
- Which school-level factors are associated with higher math scores?
- Do the effects of child-level factors vary from one school to the next?

MathScore_{ij} = \beta_{0j} + r_{ij}

\beta_{0j} = \gamma_{00} + u_{0j}

The random-effects ANOVA model

Level 1

Level 2

where \(i\) indexes children and \(j\) indexes school.

r_{ij}\sim N\left(0,\sigma^2\right)

u_{0j}\sim N\left(0,\tau_{00}\right)

MathScore_{ij} = \beta_{0j} + r_{ij}

\beta_{0j} = \gamma_{00} + u_{0j}

\(\beta_{0j}\) is a subject's predicted math score, given that they are a student at school \(j\).

Within-group variance

Level 1

Level 2

r_{ij}\sim N\left(0,\sigma^2\right)

u_{0j}\sim N\left(0,\tau_{00}\right)

\(r_{ij}\) is the subject-specific deviation from this predicted mean.

\(\sigma^2\) is the within-school variance.

MathScore_{ij} = \beta_{0j} + r_{ij}

\beta_{0j} = \gamma_{00} + u_{0j}

\(\gamma_{00}\) is the grand mean math score across schools.

Between-group variance

Level 1

Level 2

r_{ij}\sim N\left(0,\sigma^2\right)

u_{0j}\sim N\left(0,\tau_{00}\right)

\(u_{0j}\) is the school-specific deviation from this grand mean.

\(\tau_{00}\) is the between-school variance.

MathScore_{ij} = \gamma_{00} + u_{0j} + r_{ij}

grand mean across schools

Putting it together

Reduced form equation

r_{ij}\sim N\left(0,\sigma^2\right)

u_{0j}\sim N\left(0,\tau_{00}\right)

some school-specific deviation from that grand mean

some child-specific deviation from the school-implied value

Total variance = \(\tau_{00}\) +\(\sigma^2\)

How much of the variance in math scores owes to differences between schools?

We answer this question with an intraclass correlation coefficient (ICC).

\(ICC = \frac{BetweenGroups Variance}{Total Variance}\)

\(ICC = \frac{\tau_{00}}{\tau_{00}+\sigma^2}\)

Incorporating predictors

Motivating example: ECLS-K

Questions we wish to answer
- To what extent do differences between children in math ability owe to differences between schools?
- Which child-level factors are associated with higher math scores?
- Which school-level factors are associated with higher math scores?
- Do the effects of child-level factors vary from one school to the next?

MathScore_{ij} = \beta_{0j} + \beta_{1j}HoursTV_{ij} + r_{ij}

\beta_{0j} = \gamma_{00} + u_{0j}

\(\beta_{0j}\) is the predicted math score for a child who watches no TV, given that they are a student at school \(j\).

Random intercept model

Level 1

Level 2

r_{ij}\sim N\left(0,\sigma^2\right)

u_{0j}\sim N\left(0,\tau_{00}\right)

\beta_{1j} = \gamma_{10}

\(\beta_{1j}\) is the effect of hours of TV watched on math score for school \(j\). Note that it is the same for all schools here.

MathScore_{ij} = \beta_{0j} + r_{ij}

\beta_{0j} = \gamma_{00} + u_{0j}

The random-effects ANOVA model

Level 1

Level 2

where \(i\) indexes children and \(j\) indexes school.

r_{ij}\sim N\left(0,\sigma^2\right)

u_{0j}\sim N\left(0,\tau_{00}\right)

MathScore_{ij} = \gamma_{00} + \gamma_{10}HoursTV_i + u_{0j} + r_{ij}

Random intercept model

Reduced-form equation

r_{ij}\sim N\left(0,\sigma^2\right)

u_{0j}\sim N\left(0,\tau_{00}\right)

random

fixed

MathScore_{ij} = \gamma_{00} + \gamma_{10}HoursTV_{ij} + u_{0j} + r_{ij}

Random intercept model

r_{ij}\sim N\left(0,\sigma^2\right)

u_{0j}\sim N\left(0,\tau_{00}\right)

random

fixed

Note that this model could also be run (erroneously) as a standard linear regression by getting rid of random effects!

Example: ECLS-K

We predict math scale score from the number of hours of TV watched after dinner without accounting for nesting within schools
We find a fairly precipitous drop, predicting a 2.32-point reduction in math score for each hour of TV watched.

Motivating example: ECLS-K

Questions we wish to answer
- To what extent do differences between children in math ability owe to differences between schools?
- Which child-level factors are associated with higher math scores?
- Which school-level factors are associated with higher math scores?
- Do the effects of child-level factors vary from one school to the next?

MathScore_{ij} = \beta_{0j} + \beta_{1j}HoursTV_{ij} + r_{ij}

\beta_{0j} = \gamma_{00} + \gamma_{01}PctFRL_j + u_{0j}

Intercepts-as-outcomes model

Level 1

Level 2

r_{ij}\sim N\left(0,\sigma^2\right)

u_{0j}\sim N\left(0,\tau_{00}\right)

\beta_{1j} = \gamma_{10}

Here \(\gamma_{01}\) conveys the effect of \(PctFRL_j\) (the percentage of students qualifying for free or reduced lunch at school \(j\)) on the overall predicted math score for school \(j\).

MathScore_{ij} = \beta_{0j} + \beta_{1j}HoursTV_i + r_{ij}

\beta_{0j} = \gamma_{00} + u_{0j}

\(\beta_{0j}\) is the predicted math score for a child who watches no TV, given that they are a student at school \(j\).

Random intercept model

Level 1

Level 2

r_{ij}\sim N\left(0,\sigma^2\right)

u_{0j}\sim N\left(0,\tau_{00}\right)

\beta_{1j} = \gamma_{10}

\(\beta_{1j}\) is the effect of hours of TV watched on math score for school \(j\). Note that it is the same for all schools here.

MathScore_{ij} = \gamma_{00} + \gamma_{01}PctFRL_j + \gamma_{10}HoursTV_{ij} +

Intercepts-as-outcomes model

Reduced-form equation

r_{ij}\sim N\left(0,\sigma^2\right)

u_{0j}\sim N\left(0,\tau_{00}\right)

random

fixed

u_{0j} + r_{ij}

Note that even though \(PctFRL_j\) is a school-level variable and \(HoursTV_i\) is a child-level variable, both are fixed effects.

Motivating example: ECLS-K

Questions we wish to answer
- To what extent do differences between children in math ability owe to differences between schools?
- Which child-level factors are associated with higher math scores?
- Which school-level factors are associated with higher math scores?
- Do the effects of child-level factors vary from one school to the next?

MathScore_{ij} = \beta_{0j} + \beta_{1j}HoursTV_{ij} + r_{ij}

\beta_{0j} = \gamma_{00} + \gamma_{01}PctFRL_j + u_{0j}

Slopes-as-outcomes model

Level 1

Level 2

r_{ij}\sim N\left(0,\sigma^2\right)

\begin{bmatrix} u_{0j} \\ u_{1j} \end{bmatrix} \sim N \begin{bmatrix} \tau_{00} & \\ \tau_{01} & \tau_{11} \end{bmatrix}

\beta_{1j} = \gamma_{10} + \gamma_{11}PctFRL_j + u_{1j}

Here \(\gamma_{01}\) conveys the effect of \(PctFRL_j\) (the percentage of students qualifying for free or reduced lunch at school \(j\)) on the overall predicted math score for school \(j\), and \(\gamma_{01}\) conveys the effect of \(PctFRL_j\) on the effect of \(HoursTV_i\).

MathScore_{ij} = \beta_{0j} + \beta_{1j}HoursTV_{ij} + r_{ij}

\beta_{0j} = \gamma_{00} + \gamma_{01}PctFRL_j + u_{0j}

Intercepts-as-outcomes model

Level 1

Level 2

r_{ij}\sim N\left(0,\sigma^2\right)

u_{0j}\sim N\left(0,\tau_{00}\right)

\beta_{1j} = \gamma_{10}

Here \(\gamma_{01}\) conveys the effect of \(PctFRL_j\) (the percentage of students qualifying for free or reduced lunch at school \(j\)) on the overall predicted math score for school \(j\).

MathScore_{ij} = \gamma_{00} + \gamma_{01}PctFRL_j +

Slopes-as-outcomes model

Reduced-form equation

random

fixed

u_{0j} + u_{1j}HoursTV_{ij} + r_{ij}

\left(\gamma_{10} + \gamma_{11}PctFRL_j\right)HoursTV_{ij} +

fixed

\begin{bmatrix} u_{0j} \\ u_{1j} \end{bmatrix} \sim N \begin{bmatrix} \tau_{00} & \\ \tau_{01} & \tau_{11} \end{bmatrix}

r_{ij}\sim N\left(0,\sigma^2\right)

Assessing the significance of effects

Notice that we have not been examining significance tests for fixed effects.
- This makes it hard to say, e.g., whether the effect of a level-2 variable on the level-1 slope is significant
Our best strategy is to use likelihood ratio tests
- Our MO: Test whether the fit of Model A, which contains all of the relevant effects, is significantly better than Model B, which contains a subset of these effects.
- We say that Model B is "nested" within Model A, but note that the word means something different here.