PHC6016 Social Epidemiology

Multilevel Approaches

October 20, 2016

Hui Hu, PhD

huihu@ufl.edu

Introduction

Linear Mixed-Effects Model

Generalized Linear Mixed-Effects Model

Hotspots Mapping

Introduction

We usually assume the samples drawn from targeted population are independent and identically distributed (i.i.d.).

This assumption does not hold when we have data with multilevel structure:
- clustered and nested data (i.e. individuals within areas)
- longitudinal data (i.e. repeated measurements within individuals)
- non-nested structures (i.e. individuals within areas and belonging to some subgroups such as occupations)
Samples within each group are dependent, while samples between groups stay independent
Two sources of variations:
- variations within groups
- variations between groups

A longitudinal study:
- n = 3
- t = 3
Complete pooling
- poor performance
No pooling
- infeasible for large n
Partial pooling

An alternative solution: include categorical individual indicators in the traditional linear regression model.
Why do we still need mixed-effects models?

Account for both individual- and group-level variations when estimating group-level coefficients.
Easily model variations among individual-level coefficients, especially when making predictions for new groups.
Allow us to estimate coefficients for specific groups, even for groups with small n

Fixed and Random Effects

Random Effects: varying coefficients
Fixed Effects: varying coefficients that are not themselves modeled

How to decide whether to use fixed-effects or random-effects?

When do mixed-effects models make a difference?

Fixed and Random Effects

Two extreme cases:

when the group-level variation is very little
- reduce to traditional regression models without group indicators (complete pooling)
when the group-level variation is very large
- reduce to traditional regression models with group indicators (no-pooling)

Little risk to apply a mixed-effects model

What's the difference between no-pooling models and mixed-effects models only with varying intercepts?

In no-pooling models, the intercept is obtained by least squares estimates, which equals to the fitted intercepts in models that are run separately by group.
In mixed-effects models, we assign a probability distribution to the random intercept:

Intraclass Correlation (ICC)

shows the variation between groups

ICC ranges from 0 to 1:

ICC -> 0: the groups give no information (complete-pooling)
ICC -> 1: all individuals of a group are identical (no-pooling)

Intraclass Correlation (ICC)

ICC ranges from 0 to 1:

ICC -> 0: "hard constraint" to
ICC -> 1: "no constraint" to
Mixed-effects model: "soft constraint" to

This constraint has different effects on different groups:

For group with small n, a strong pooling is usually seen, where the value of is close to the mean (towards complete-pooling)
For group with large n, the pooling will be weak, where the value of is far away from the mean (towards no-pooling)

Linear Mixed-Effects Model

Pull the codes and dataset: https://github.com/benhhu/R-Mixed-Effects-Model

Load the Packages and Data

1,000 participants

5 repeated measurements

bmi

time

age

race: 1=white, 2=black, 3=others

gender: 1=male, 2=female

edu: 1=<HS, 2=HS, 3=>HS

sbp

am: 1=measured in morning

ex: #days exercised in the past year

Varying-intercept Model with No Predictors

allows intercept to vary by individual

estimated intercept, averaging over the individuals

estimated variations

Varying-intercept Model with an individual-level predictor

Varying-intercept Model with both individual-level and group-level predictors

Varying Slopes Models

With only an individual-level predictor

Varying Slopes Models

Add a group-level predictor

Non-nested Models

Generalized Linear Mixed-Effects Model

Mixed-Effects Logistic Model

Empty model

Mixed-Effects Logistic Model

Add bmi and race

Mixed-Effects Poisson Model

Parameter Estimation Algorithms

ML: maximum likelihood
REML: restricted maximum likelihood
- default in lmer()
PQL: pseudo- and penalized quasilikelihood
Laplace approximations
- default in glmer()
GHQ: Gauss-Hermite quadrature
McMC: Markov chain Monte Carlo

Bolker BM, Brooks ME, Clark CJ, Geange SW, Poulsen JR, Stevens MHH, et al. 2009. Generalized linear mixed models: A practical guide for ecology and evolution. Trends in ecology & evolution 24:127-135.

Mixed-Effects Model vs. GEE

	Mixed-Effects Model	Marginal Model with GEE
Distributional assumptions	Yes	No
Population average estimates	Yes	Yes
Group-specific estimates	Yes	No
Estimate variance components	Yes	No
Perform good with small n	Yes	No

Hotspots Mapping

Introduction to Spatial Data

Data Models

A geographic data model is a structure for organizing geospatial data so that it can be easily stored and retrieved.

Geographic coordinates

Tabular attributes

Spatial Data Models

Vector Model

- points, lines, polygons

Raster Model
- exhaustive regular or irregular partitioning of space

Points

Lines

Shapefiles

.shp - the file that stores the geometry of the feature

.shx - the file that stores the index of the feature geometry

.dbf - the dBASE file that stores the attribute information

.prj - the file that defines the shapefile's projection

.html, .htm, .xml - the files that usually contains metadata

.sbn and .sbx - store additional indices

ftp://ftp2.census.gov/geo/tiger/TIGER2016/BG/tl_2016_12_bg.zip

Coordinate Systems and Projections

3D sphere

Geographic Coordinate System

2D flat

Projected Coordiate System

Geographic Coordinate Systems

Longitude and latitude
Units: Degrees (DMS or DD)

Shape of the Earth

Surface: The Earth's real surface
Ellipsoid: Ideal, smooth surface
Geoid: Bumpy surface, where gravity is equal for all locations

Datum

Defines the position of the spheroid relative to the center of the earth.
Global datum:
- uses the earth's center of mass as the origin
Local datum:
- aligns its spheroid to closely fit the earth's surface in a particular area
- a point on the surface of the spheroid is matched to a particular position on the surface of the earth
- the coordinate system origin of a local datum is not at the center of the earth

Datum

Common Local Datum: North American Datum (NAD)

Common Global Datum: World Geodetic System (WGS)

Projected Coordinate Systems

A projected coordinate system is defined on a flat, two-dimensional surface
Unlike a geographic coordinate system, a projected coordinate system has constant lengths, angles, and areas across the two dimensions
A projected coordinate system is always based on a geographic coordinate system

The systematic rendering of a graticule on a flat map surface

Distortion

Converting a sphere to a flat surface results in distortion

Shape (conformal) - If a map preserves shape, then feature outlines (like county boundaries) look the same on the map as they do on the earth.
- Lambert Conformal Conic
- UTM
Area (equal-area) - If a map preserves area, then the size of a feature on a map is the same relative to its size on the earth.
- Alerts Equal Area Conic
Distance (equidistant) - An equidistant map is one that preserves true scale for all straight lines passing through a single, specified point. If a line from a to b on a map is the same distance that it is on the earth, then the map line has true scale. No map has true scale everywhere.
Direction/Azimuth (azimuthal) – An azimuthal projection is one that preserves direction for all straight lines passing through a single, specified point.

Universal Transverse Mercator Coordinate System

World divided into 60 six-degree-wide zones
From 80S to 84N
Zones numbered 1-60 (N&S), W to E, starting at 180W

Differences between Projections

Spatial Patterns

Random

Cluster

Regular

Disease Cluster

The occurrence of a greater than expected number of cases of a particular disease within a group of people, a geographic area, or a period of time.
A collection of disease occurrence:
- of sufficient size and concentation to be unlikely to have occurred by chance, or
- related to each other through some social or biological mechanism, or having a common relationship with some other events or circumstance
Spatial aggregation of disease events may only be a function of the distribution of population
Disease cluster: residual spatial variation in risk after known influence have been accounted for

Why

Confirmatory purpose
- verify if a perceived cluster exists
Exploratory purpose
- searching for spatial patterns
Identification of clusters can lead to interventions

Methods

Global clustering:
- evaluate whether clustering exist as a global phenomena throughout the study region, without pinpointing the locaiton of specific cluster
- aggregated data: Moran's I, Geary's C, etc.
- points data: K-nearest neighbour method, etc.
Local clustering:
- additionally specify the location and can be extended to specify spatial-temporal clusters

Local Clustering

Focused tests:
- investigate whether there is an increased risk of disease around a predetermined point
- e.g. Superfund site, power plant.
- Lawson Waller score test
Non-focused tests
- identify the location of all likely clusters in the study region
- LISA, Getis-Ord's local statistics, spatial scan statistics

LISA - Local Moran's I

Local indicators of spatial autocorrelation (LISA)
- show similarity with neighbors and also test its significance
Divide the study region into 5 categories:
- high-high locations: hot spots
- low-low locations: cold spots
- high-low locations: spatial outliers
- low-high locations: spatial outliers
- Locations with insignificant local autocorrelation
GeoDa

Spatial Scan Statstics

Search over a given set of spatial regions
Find those regions which are most likely to be clusters
Correctly adjust for multiple hypothesis testing
SatScan

A circular scanning window is placed at different coordinates with radius that vary from 0 to some set upper limit.
For each location and size of window
H = elevated risk within window as compared to outside of window