PHC6194 SPATIAL EPIDEMIOLOGY

Integrated Nested Laplace Approximation

Hui Hu Ph.D.

Department of Epidemiology

College of Public Health and Health Professions & College of Medicine

April 11, 2018

Integrated Nested Laplace Approximation

Lab: R-INLA

Integrated Nested Laplace Approximation

Advantages of the Bayesian Approach

  • The specification of prior distributions allows the formal inclusion of information that can be obtained through previous studies or from expert opinion
     
  • The posterior probability that a parameter does/does not exceed a certain threshold is easily obtained from the posterior distribution, providing a more intuitive and interpretable quantity than a frequentist p-value
     
  • Easy to specify a hierarchical structure on the data and parameters

Bayesian Approach for Epidemiological Data

  • Epidemiological data are often characterized by a spatial and/or temporal structure which needs to be taken into account in the inferential process
    -  if the data consists of aggregated counts of outcomes and covariates, typically disease mapping and ecological regression can be specified
    -  if the outcome or risk factors data are observed at point locations, then geostatistical models are considered as suitable representations of the problem
     
  • Both models can be specified in a Bayesian framework by simply extending the concept of hierarchical structure
    -  allowing to account for similarities based on the neighborhood (for areal data) or on the distance (for point-reference data)

Bayesian Inference in Spatial Epidemiological Studies

  • Bayesian inference has become very popular in spatial analyses in recent years
    -  due to the availability of computation methods to tackle fitting of spatial models
    -  e.g. Besag, York, and Mollie (1991) proposed a method to fit a spatial model using MCMC, which has been extensively used and extended to consider different types of fixed and random effects of spatial and spatio-temporal analysis
     
  • Fitting these models has been possible because of the availability of different computational techniques, e.g. MCMC
    -  however, for large models or big data sets, MCMC can be tedious
    -  reaching the required number of samples (burn-in period) can take a long time, and autocorrelation may arise and an increased number of iterations may be required

Approximation of Posterior Distribution

  • Posterior distributions may be approximated in some way
    -  however, most models are highly multivariate and approximating the full posterior distribution may not be possible in practice
     
  • The integrated nested Laplace approximation (INLA) approach has been developed as a computationally efficient alternative to MCMC
     
  • INLA focuses on the posterior marginals for latent Gaussian models
    -  although these models may seem rather restricted, they appear in a fair number of fields and are usually enough to handle most of the problems in spatial analyses
    -  ranging from generalized linear mixed to spatial and spatio-temporal models
     
  • INLA can also be combined with the Stochastic Partial Differential Equation (SPDE) approach to implement spatial and spatio-temporal models for point-reference data

Spatial Data

  • Spatial data are defined as realizations of a stochastic process indexed by space

     
  • The actual data can be represented by a collection of observations


     
  • The set {s1,...sn} indicates the spatial units at which the measurements are taken
     
  • Depending on D being a spatial surface or a countable collection of 2-dimensional spatial units, the problem can be specified as a spatially continuous or discrete random process, respectively 
Y(s)\equiv \{y(s),s\in D \}
Y(s){y(s),sD}Y(s)\equiv \{y(s),s\in D \}
y\{ y(s_1),...,y(s_n)\}
y{y(s1),...,y(sn)}y\{ y(s_1),...,y(s_n)\}

Examples

  • Spatially continuous random process (for point-reference data):
    -  a collection of air pollutant measurements obtained by monitors located in the set (s1,...sn) of n points
    -  in this case, y is a realization of the air pollution process that changes continuously in space
     
  • Spatially discrete random process (for areal data):
    -  spatial pattern of a certain health condition observed in a set (s1,...,sn) of n areas (such as census tracts or counties)
    -  in this case, y represent a suitable summary (e.g. the number of cases observed in each area)

Spatial Model within the Bayesian Framework

  • The first step is to identify a probability distribution for the observed data (likelihood)
    -  usually we use a distribution from the Exponential family, indexed by a set of parameters θ accounting for the spatial correlation
     
  • For point-referenced data:
    -  the parameters are defined as a latent stationary Gaussian field, a function of some hyper-parameters ψ associated with a suitable prior distribution p(ψ)
    -  this is equivalent to assuming that θ has a multivariate Normal distribution with mean µ=(µ1,...,µn) and spatially structured covariance matrix Σ



     
  • For areal data:
    -  it is possible to reformulate the problem in terms of the neighborhood structure
\Sigma_{ij}=Cov(\theta_i,\theta_j)=\sigma_c^2C(\Delta_{ij})
Σij=Cov(θi,θj)=σc2C(Δij)\Sigma_{ij}=Cov(\theta_i,\theta_j)=\sigma_c^2C(\Delta_{ij})
C(\Delta_{ij})={1\over {\Gamma (\lambda)2^{\lambda-1}}} {(\kappa\Delta_{ij})}^\lambda K_\lambda(\kappa\Delta_{ij})
C(Δij)=1Γ(λ)2λ1(κΔij)λKλ(κΔij)C(\Delta_{ij})={1\over {\Gamma (\lambda)2^{\lambda-1}}} {(\kappa\Delta_{ij})}^\lambda K_\lambda(\kappa\Delta_{ij})
\Delta_{ij}= \|s_i-s_j\|
Δij=sisj\Delta_{ij}= \|s_i-s_j\|

INLA

  • We are usually interested in estimating the effect of a set of relevant covariates on some function (typically the mean) of the observed data, while accounting for the spatial or spatio-temporal correlation implied in the model
\eta_i=\alpha+\sum_{m=1}^M\beta_mx_{mi}+\sum_{l=1}^Lf_l(z_{li})
ηi=α+m=1Mβmxmi+l=1Lfl(zli)\eta_i=\alpha+\sum_{m=1}^M\beta_mx_{mi}+\sum_{l=1}^Lf_l(z_{li})

the mean for the i-th unit

intercept

the coefficients quantify the effect of some covariates on the response

a collection of functions defined in terms of a set of covariates

  • We can vary the form of the functions fl() to accomodate a wide range of models, from standard and hierarchical regression, to spatial and spatio-temporal models

INLA (cont'd)

  • The INLA approach exploits the assumptions of the model to produce a numerical approximation to the posteriors of interest, based on the Laplace approximation
     
  • INLA proceeds by first exploring the marginal joint posterior for the hyper-parameters in order to locate the mode
     
  • A grid search is then performed and produces a set of relevant points together with a corresponding set of weights to give the approximation to this distribution
     
  • Each marginal posterior can be obtained using interpolation based on the computed values and correcting for skewness

Lab: R-INLA

R-INLA

  • The R-INLA package provides an interface to INLA so that models can be fitted using standard R commands
     
  • Assuming a vector of two covariates x1, x2, and a function f() indexed by a third covariate z1:

     
  • The coefficients α, β1, β2 are by default given independent prior Normal distributions with zero mean and small precision
     
  • The term f() is used to specify the structure of the function f(), using the following notation:

    -  the default choice is model="iid"
    -  the specification can be used to build standard hierarchical models
     
  • Once the model has been specified, we can run the INLA by:
formula<-y~1+x1+x2+f(z1,...)
f(z1,model="...",...)
mod<-inla(formula,family="...",data)

git pull

Spatial Areal Data: Suicides in London

  • To investigate suicide mortality in n=32 London boroughs in the period 1989-1993




     
  • Assuming a Besag-York-Mollie (BYM) specification, υi is the spatially structured residual, modeled using an intrinsic conditional autoregressive structure
y_i \sim Poisson(\lambda_i)
yiPoisson(λi)y_i \sim Poisson(\lambda_i)
\lambda_i=\rho_iE_i
λi=ρiEi\lambda_i=\rho_iE_i
\eta_i=log(\rho_i)=\alpha+\upsilon_i+\nu_i
ηi=log(ρi)=α+υi+νi\eta_i=log(\rho_i)=\alpha+\upsilon_i+\nu_i
\upsilon_i|\upsilon_{j\neq i} \sim Normal(m_i,s_i^2)
υiυjiNormal(mi,si2)\upsilon_i|\upsilon_{j\neq i} \sim Normal(m_i,s_i^2)
m_i={\sum_{j\in N(i) }\upsilon _j \over \#N(i)}
mi=jN(i)υj#N(i)m_i={\sum_{j\in N(i) }\upsilon _j \over \#N(i)}
s_i={\sigma^2_{\upsilon}\over \#N(i)}
si=συ2#N(i)s_i={\sigma^2_{\upsilon}\over \#N(i)}

the number of areas which share boundaries with the i-th one (i.e. its neighbors)

\nu_i \sim Normal(0,\sigma_\nu^2)
νiNormal(0,σν2)\nu_i \sim Normal(0,\sigma_\nu^2)