**PHC6194 SPATIAL EPIDEMIOLOGY**

**Ecological Analyses and Mixed-Effect Model**

Hui Hu Ph.D.

*Department of Epidemiology*

*College of Public Health and Health Professions & College of Medicine*

March 14, 2018

### Introduction

### Linear Mixed-Effects Model

### Generalized Linear Mixed-Effects Model

# Introduction

**Ecological Studies**

- Based on grouped data, with the groups in a spatial context corresponding to geographical areas

- Ecological studies have a long history in many disciplines in addition to epidemiology and public health:

- political science, geography, sociology

- Due to aggregation, ecological studies are susceptible to unique challenges, in particular the potential for ecological bias

- the difference between estimated association based on ecological- and individual-level data

- Ecological data can be used for a variety of purposes:

- mapping: ecological bias is not a big problem (with-in areas variations may be obscured by agggregation)

- cluster detection: small-area anomalies may be washed away when data are aggregated

**Ecological Bias**

- The fundamental problem with ecological inference is that the process of aggregation reduces information

- this information loss usually prevents identification of parameters of interest in the underlying individual-level model

- If there is no within-area variability in exposures and confounders, then there will be no ecological bias

- therefore, ecological bias occurs due to within-area variability in exposures and confounders

- Distinct consequences of this variability:

- pure specification bias

- confounding

**Pure Specification Bias**

- Also called model specification bias

- This bias arises because a nonlinear risk model changes its form under aggregation

- This type of bias has nothing to do with confounding

**Pure Specification Bias (cont'd)**

- In an ecological setting, the individual-level data are unavailable, and rather, we observe the aggregate data that correspond to the average outcome and exposure

# individuals in area i (i=1,2,...,m)

Outcome for individual j in area i

Exposure for individual j in area i

- On aggregation, we have

**Pure Specification Bias (cont'd)**

- Aggregate to sum:

- Dividing the left- and right-hand sides by ni

- When there is no within-area variability in exposure:

There is no ecological bias

- The pure specification bias is reduced if areas are smaller, since the heterogeneity of exposures within areas is decreased

**Confounding**

- It is challenging to characterize the within-area joint distribution of exposures and confounders with only aggregated data

- Two scenarios when we can address the confounding issue with aggregated data

- the exposure and confounders are independent (no interaction between exposure and confounders)

- if we have the confounders that are constant within areas (e.g. county-level policy)

- We usually assume the samples drawn from targeted population are
**independent**and**identically distributed (i.i.d.)**.

- This assumption does not hold when we have data with multilevel structure:

- clustered and nested data (i.e. individuals within areas)

- longitudinal data (i.e. repeated measurements within individuals)

- non-nested structures (i.e. individuals within areas and belonging to some subgroups such as occupations)

- Samples within each group are dependent, while samples between groups stay independent

- Two sources of variations:

- variations within groups

- variations between groups

- A longitudinal study:

- n = 3

- t = 3

- Complete pooling

- poor performance

- No pooling

- infeasible for large n

- Partial pooling

- An alternative solution: include categorical individual indicators in the traditional linear regression model.

- Why do we still need mixed-effects models?

- Account for both individual- and group-level variations when estimating group-level coefficients.

- Easily model variations among individual-level coefficients, especially when making predictions for new groups.

- Allow us to estimate coefficients for specific groups, even for groups with small n

## Fixed and Random Effects

- Random Effects: varying coefficients
- Fixed Effects: varying coefficients that are not themselves modeled

### How to decide whether to use fixed-effects or random-effects?

### When do mixed-effects models make a difference?

## Fixed and Random Effects

Two extreme cases:

- when the group-level variation is very little

- reduce to traditional regression models without group indicators (complete pooling) - when the group-level variation is very large

- reduce to traditional regression models with group indicators (no-pooling)

### Little risk to apply a mixed-effects model

**What's the difference between no-pooling models and mixed-effects models only with varying intercepts?**

- In no-pooling models, the intercept is obtained by least squares estimates, which equals to the fitted intercepts in models that are run separately by group.
- In mixed-effects models, we assign a probability distribution to the random intercept:

**Intraclass Correlation (ICC)**

shows the variation between groups

ICC ranges from 0 to 1:

- ICC -> 0: the groups give no information (complete-pooling)
- ICC -> 1: all individuals of a group are identical (no-pooling)

**Intraclass Correlation (ICC)**

ICC ranges from 0 to 1:

- ICC -> 0: "hard constraint" to
- ICC -> 1: "no constraint" to
- Mixed-effects model: "soft constraint" to

This constraint has different effects on different groups:

- For group with small n, a strong pooling is usually seen, where the value of is close to the mean (towards complete-pooling)
- For group with large n, the pooling will be weak, where the value of is far away from the mean (towards no-pooling)

# Linear Mixed-Effects Model

### git pull

### Load the Packages and Data

1,000 participants

5 repeated measurements

bmi

time

id

age

race: 1=white, 2=black, 3=others

gender: 1=male, 2=female

edu: 1=<HS, 2=HS, 3=>HS

sbp

am: 1=measured in morning

ex: #days exercised in the past year

### Varying-intercept Model with No Predictors

*allows intercept to vary by individual*

*estimated intercept, averaging over the individuals*

*estimated variations*

### Varying-intercept Model with an individual-level predictor

### Varying-intercept Model with both individual-level and group-level predictors

### Varying Slopes Models

With only an individual-level predictor

### Varying Slopes Models

Add a group-level predictor

### Non-nested Models

# Generalized Linear Mixed-Effects Model

### Mixed-Effects Logistic Model

Empty model

### Mixed-Effects Logistic Model

Add bmi and race

### Mixed-Effects Poisson Model

### Parameter Estimation Algorithms

- ML: maximum likelihood

- REML: restricted maximum likelihood

- default in lmer() - PQL: pseudo- and penalized quasilikelihood

- Laplace approximations

- default in glmer() - GHQ: Gauss-Hermite quadrature

- McMC: Markov chain Monte Carlo

Bolker BM, Brooks ME, Clark CJ, Geange SW, Poulsen JR, Stevens MHH, et al. 2009. Generalized linear mixed models: A practical guide for ecology and evolution. Trends in ecology & evolution 24:127-135.

### Mixed-Effects Model vs. GEE

Mixed-Effects Model | Marginal Model with GEE | |
---|---|---|

Distributional assumptions | Yes | No |

Population average estimates | Yes | Yes |

Group-specific estimates | Yes | No |

Estimate variance components | Yes | No |

Perform good with small n | Yes | No |

#### PHC6194-Spring2018-Lecture9

By Hui Hu

# PHC6194-Spring2018-Lecture9

Slides for Lecture 9, Spring 2018, PHC6194 Spatial Epidemiology

- 424