data:image/s3,"s3://crabby-images/30cbf/30cbf4ae13f4b8cc433ea935b256491293b078f3" alt=""
Introduction
Linear Mixed-Effects Model
Generalized Linear Mixed-Effects Model
Hotspots Mapping
Introduction
- We usually assume the samples drawn from targeted population are independent and identically distributed (i.i.d.).
- This assumption does not hold when we have data with multilevel structure:
- clustered and nested data (i.e. individuals within areas)
- longitudinal data (i.e. repeated measurements within individuals)
- non-nested structures (i.e. individuals within areas and belonging to some subgroups such as occupations)
- Samples within each group are dependent, while samples between groups stay independent
- Two sources of variations:
- variations within groups
- variations between groups
data:image/s3,"s3://crabby-images/a4fd6/a4fd60ec47178e7536bce30a8bb6a40aa51e4e69" alt=""
- A longitudinal study:
- n = 3
- t = 3
- Complete pooling
- poor performance
- No pooling
- infeasible for large n
- Partial pooling
- An alternative solution: include categorical individual indicators in the traditional linear regression model.
- Why do we still need mixed-effects models?
- Account for both individual- and group-level variations when estimating group-level coefficients.
- Easily model variations among individual-level coefficients, especially when making predictions for new groups.
- Allow us to estimate coefficients for specific groups, even for groups with small n
Fixed and Random Effects
- Random Effects: varying coefficients
- Fixed Effects: varying coefficients that are not themselves modeled
How to decide whether to use fixed-effects or random-effects?
When do mixed-effects models make a difference?
Fixed and Random Effects
Two extreme cases:
- when the group-level variation is very little
- reduce to traditional regression models without group indicators (complete pooling) - when the group-level variation is very large
- reduce to traditional regression models with group indicators (no-pooling)
Little risk to apply a mixed-effects model
What's the difference between no-pooling models and mixed-effects models only with varying intercepts?
- In no-pooling models, the intercept is obtained by least squares estimates, which equals to the fitted intercepts in models that are run separately by group.
- In mixed-effects models, we assign a probability distribution to the random intercept:
Intraclass Correlation (ICC)
shows the variation between groups
ICC ranges from 0 to 1:
- ICC -> 0: the groups give no information (complete-pooling)
- ICC -> 1: all individuals of a group are identical (no-pooling)
Intraclass Correlation (ICC)
ICC ranges from 0 to 1:
- ICC -> 0: "hard constraint" to
- ICC -> 1: "no constraint" to
- Mixed-effects model: "soft constraint" to
This constraint has different effects on different groups:
- For group with small n, a strong pooling is usually seen, where the value of is close to the mean (towards complete-pooling)
- For group with large n, the pooling will be weak, where the value of is far away from the mean (towards no-pooling)
data:image/s3,"s3://crabby-images/86642/8664247b9d4b978a996530682fbe941cb7722f21" alt=""
data:image/s3,"s3://crabby-images/86642/8664247b9d4b978a996530682fbe941cb7722f21" alt=""
Linear Mixed-Effects Model
Pull the codes and dataset: https://github.com/benhhu/R-Mixed-Effects-Model
Load the Packages and Data
1,000 participants
5 repeated measurements
bmi
time
id
age
race: 1=white, 2=black, 3=others
gender: 1=male, 2=female
edu: 1=<HS, 2=HS, 3=>HS
sbp
am: 1=measured in morning
ex: #days exercised in the past year
Varying-intercept Model with No Predictors
allows intercept to vary by individual
estimated intercept, averaging over the individuals
estimated variations
Varying-intercept Model with an individual-level predictor
Varying-intercept Model with both individual-level and group-level predictors
data:image/s3,"s3://crabby-images/86234/86234a33784188f4f37d32e64330acf04697cebe" alt=""
Varying Slopes Models
With only an individual-level predictor
Varying Slopes Models
Add a group-level predictor
Non-nested Models
Generalized Linear Mixed-Effects Model
Mixed-Effects Logistic Model
Empty model
data:image/s3,"s3://crabby-images/73de6/73de65a3f1a71269c4c6696267fe595a50667a4a" alt=""
Mixed-Effects Logistic Model
Add bmi and race
data:image/s3,"s3://crabby-images/80e07/80e07b21274921951a9519306fe1ae0f3186fd81" alt=""
data:image/s3,"s3://crabby-images/022ee/022eeb053d749ee52a9b9e996b5e4cdefd7b7cd7" alt=""
Mixed-Effects Poisson Model
data:image/s3,"s3://crabby-images/4ef8f/4ef8f30aaf3d9ae631cf3c3590c3ed9763c65cc9" alt=""
data:image/s3,"s3://crabby-images/45d0d/45d0d021b4e54732b3bb889e23696fb67089eba9" alt=""
Parameter Estimation Algorithms
- ML: maximum likelihood
- REML: restricted maximum likelihood
- default in lmer() - PQL: pseudo- and penalized quasilikelihood
- Laplace approximations
- default in glmer() - GHQ: Gauss-Hermite quadrature
- McMC: Markov chain Monte Carlo
data:image/s3,"s3://crabby-images/5d2f5/5d2f56ed2611f61dd88f6f26ccdf0694bd48d77a" alt=""
Bolker BM, Brooks ME, Clark CJ, Geange SW, Poulsen JR, Stevens MHH, et al. 2009. Generalized linear mixed models: A practical guide for ecology and evolution. Trends in ecology & evolution 24:127-135.
Mixed-Effects Model vs. GEE
Mixed-Effects Model | Marginal Model with GEE | |
---|---|---|
Distributional assumptions | Yes | No |
Population average estimates | Yes | Yes |
Group-specific estimates | Yes | No |
Estimate variance components | Yes | No |
Perform good with small n | Yes | No |
Hotspots Mapping
Introduction to Spatial Data
Data Models
A geographic data model is a structure for organizing geospatial data so that it can be easily stored and retrieved.
data:image/s3,"s3://crabby-images/7c61c/7c61ccd2be8938fc89b5ba0e71bc9d99ccdf9464" alt=""
data:image/s3,"s3://crabby-images/2a4b4/2a4b4e0ed13e67669855ad087809d39f5fa0a8a2" alt=""
Geographic coordinates
Tabular attributes
Spatial Data Models
Vector Model
- points, lines, polygons
data:image/s3,"s3://crabby-images/10310/10310c0902de0786b1857d3937d1ba3aaf93c774" alt=""
data:image/s3,"s3://crabby-images/b2c2c/b2c2cc17f2b885226f3df3fb7168ed05177bcab3" alt=""
Raster Model
- exhaustive regular or irregular partitioning of space
Points
data:image/s3,"s3://crabby-images/12893/12893fb52081eb77bb002b720f127f942a5dc61f" alt=""
data:image/s3,"s3://crabby-images/9bd52/9bd52dc728c9a0cf4239c9c42ee681eb21a26f57" alt=""
Lines
Shapefiles
.shp - the file that stores the geometry of the feature
.shx - the file that stores the index of the feature geometry
.dbf - the dBASE file that stores the attribute information
.prj - the file that defines the shapefile's projection
.html, .htm, .xml - the files that usually contains metadata
.sbn and .sbx - store additional indices
data:image/s3,"s3://crabby-images/0ed5c/0ed5c77cae551bba21f4c521331475aece27a091" alt=""
Coordinate Systems and Projections
data:image/s3,"s3://crabby-images/3aea1/3aea1a110615244c4e23f683b5e51809f724f26e" alt=""
3D sphere
Geographic Coordinate System
2D flat
Projected Coordiate System
Geographic Coordinate Systems
data:image/s3,"s3://crabby-images/0a185/0a185438e4abbfc109397cf16e69a1b03d055ca3" alt=""
- Longitude and latitude
- Units: Degrees (DMS or DD)
Shape of the Earth
- Surface: The Earth's real surface
- Ellipsoid: Ideal, smooth surface
- Geoid: Bumpy surface, where gravity is equal for all locations
data:image/s3,"s3://crabby-images/4352d/4352d23d067a1eda3a5913cb016e8d9babfe8240" alt=""
Datum
- Defines the position of the spheroid relative to the center of the earth.
- Global datum:
- uses the earth's center of mass as the origin
- Local datum:
- aligns its spheroid to closely fit the earth's surface in a particular area
- a point on the surface of the spheroid is matched to a particular position on the surface of the earth
- the coordinate system origin of a local datum is not at the center of the earth
Datum
data:image/s3,"s3://crabby-images/8b35b/8b35b6543daa6b91d18487c07437fd50a7d11f8b" alt=""
Common Local Datum: North American Datum (NAD)
Common Global Datum: World Geodetic System (WGS)
Projected Coordinate Systems
- A projected coordinate system is defined on a flat, two-dimensional surface
- Unlike a geographic coordinate system, a projected coordinate system has constant lengths, angles, and areas across the two dimensions
- A projected coordinate system is always based on a geographic coordinate system
data:image/s3,"s3://crabby-images/fd1f3/fd1f39300e68da257df65d3c3dd9cbd983f834d8" alt=""
The systematic rendering of a graticule on a flat map surface
Distortion
Converting a sphere to a flat surface results in distortion
-
Shape (conformal) - If a map preserves shape, then feature outlines (like county boundaries) look the same on the map as they do on the earth.
- Lambert Conformal Conic
- UTM -
Area (equal-area) - If a map preserves area, then the size of a feature on a map is the same relative to its size on the earth.
- Alerts Equal Area Conic - Distance (equidistant) - An equidistant map is one that preserves true scale for all straight lines passing through a single, specified point. If a line from a to b on a map is the same distance that it is on the earth, then the map line has true scale. No map has true scale everywhere.
- Direction/Azimuth (azimuthal) – An azimuthal projection is one that preserves direction for all straight lines passing through a single, specified point.
Universal Transverse Mercator Coordinate System
data:image/s3,"s3://crabby-images/217e7/217e792ded743422de5d4b47f040e61c80c476cd" alt=""
- World divided into 60 six-degree-wide zones
- From 80S to 84N
- Zones numbered 1-60 (N&S), W to E, starting at 180W
Differences between Projections
data:image/s3,"s3://crabby-images/50f9a/50f9ab486169fe62148f918e59a52de302273541" alt=""
Spatial Patterns
data:image/s3,"s3://crabby-images/972da/972dad4f703093767404da1cb6606aa461955153" alt=""
data:image/s3,"s3://crabby-images/90855/90855b8f8fca16130712e121d7bd04a1972c26fe" alt=""
data:image/s3,"s3://crabby-images/b7d84/b7d849680aab980072c199886d0af80736190718" alt=""
Random
Cluster
Regular
Disease Cluster
- The occurrence of a greater than expected number of cases of a particular disease within a group of people, a geographic area, or a period of time.
- A collection of disease occurrence:
- of sufficient size and concentation to be unlikely to have occurred by chance, or
- related to each other through some social or biological mechanism, or having a common relationship with some other events or circumstance
- Spatial aggregation of disease events may only be a function of the distribution of population
- Disease cluster: residual spatial variation in risk after known influence have been accounted for
Why
- Confirmatory purpose
- verify if a perceived cluster exists
- Exploratory purpose
- searching for spatial patterns
- Identification of clusters can lead to interventions
Methods
- Global clustering:
- evaluate whether clustering exist as a global phenomena throughout the study region, without pinpointing the locaiton of specific cluster
- aggregated data: Moran's I, Geary's C, etc.
- points data: K-nearest neighbour method, etc.
- Local clustering:
- additionally specify the location and can be extended to specify spatial-temporal clusters
Local Clustering
- Focused tests:
- investigate whether there is an increased risk of disease around a predetermined point
- e.g. Superfund site, power plant.
- Lawson Waller score test
- Non-focused tests
- identify the location of all likely clusters in the study region
- LISA, Getis-Ord's local statistics, spatial scan statistics
LISA - Local Moran's I
- Local indicators of spatial autocorrelation (LISA)
- show similarity with neighbors and also test its significance
- Divide the study region into 5 categories:
- high-high locations: hot spots
- low-low locations: cold spots
- high-low locations: spatial outliers
- low-high locations: spatial outliers
- Locations with insignificant local autocorrelation
- GeoDa
data:image/s3,"s3://crabby-images/cc122/cc1221a654601c2a182502b2aeb102184f0edc7a" alt=""
Spatial Scan Statstics
- Search over a given set of spatial regions
- Find those regions which are most likely to be clusters
- Correctly adjust for multiple hypothesis testing
- SatScan
- A circular scanning window is placed at different coordinates with radius that vary from 0 to some set upper limit.
- For each location and size of window
H = elevated risk within window as compared to outside of window
data:image/s3,"s3://crabby-images/58b01/58b01810eaf40a099f382589cba59f98d1e9fcf4" alt=""
A
Multilevel Approaches - PHC6016
By Hui Hu
Multilevel Approaches - PHC6016
Slides for the Social Epidemiology guest lecture, Fall 2016
- 1,753