PHC6194 SPATIAL EPIDEMIOLOGY
Integrated Nested Laplace Approximation
Hui Hu Ph.D.
Department of Epidemiology
College of Public Health and Health Professions & College of Medicine
April 8, 2020
Integrated Nested Laplace Approximation
Lab: RINLA
Integrated Nested Laplace Approximation
Advantages of the Bayesian Approach
 The specification of prior distributions allows the formal inclusion of information that can be obtained through previous studies or from expert opinion
 The posterior probability that a parameter does/does not exceed a certain threshold is easily obtained from the posterior distribution, providing a more intuitive and interpretable quantity than a frequentist pvalue
 Easy to specify a hierarchical structure on the data and parameters
Bayesian Approach for Epidemiological Data
 Epidemiological data are often characterized by a spatial and/or temporal structure which needs to be taken into account in the inferential process
 if the data consists of aggregated counts of outcomes and covariates, typically disease mapping and ecological regression can be specified
 if the outcome or risk factors data are observed at point locations, then geostatistical models are considered as suitable representations of the problem
 Both models can be specified in a Bayesian framework by simply extending the concept of hierarchical structure
 allowing to account for similarities based on the neighborhood (for areal data) or on the distance (for pointreference data)
Bayesian Inference in Spatial Epidemiological Studies
 Bayesian inference has become very popular in spatial analyses in recent years
 due to the availability of computation methods to tackle fitting of spatial models
 e.g. Besag, York, and Mollie (1991) proposed a method to fit a spatial model using MCMC, which has been extensively used and extended to consider different types of fixed and random effects of spatial and spatiotemporal analysis
 Fitting these models has been possible because of the availability of different computational techniques, e.g. MCMC
 however, for large models or big data sets, MCMC can be tedious
 reaching the required number of samples (burnin period) can take a long time, and autocorrelation may arise and an increased number of iterations may be required
Approximation of Posterior Distribution
 Posterior distributions may be approximated in some way
 however, most models are highly multivariate and approximating the full posterior distribution may not be possible in practice

The integrated nested Laplace approximation (INLA) approach has been developed as a computationally efficient alternative to MCMC
 INLA focuses on the posterior marginals for latent Gaussian models
 although these models may seem rather restricted, they appear in a fair number of fields and are usually enough to handle most of the problems in spatial analyses
 ranging from generalized linear mixed to spatial and spatiotemporal models
 INLA can also be combined with the Stochastic Partial Differential Equation (SPDE) approach to implement spatial and spatiotemporal models for pointreference data
Spatial Data
 Spatial data are defined as realizations of a stochastic process indexed by space
 The actual data can be represented by a collection of observations
 The set {s1,...sn} indicates the spatial units at which the measurements are taken
 Depending on D being a spatial surface or a countable collection of 2dimensional spatial units, the problem can be specified as a spatially continuous or discrete random process, respectively
Y(s)\equiv \{y(s),s\in D \}
y\{ y(s_1),...,y(s_n)\}
Examples
 Spatially continuous random process (for pointreference data):
 a collection of air pollutant measurements obtained by monitors located in the set (s1,...sn) of n points
 in this case, y is a realization of the air pollution process that changes continuously in space
 Spatially discrete random process (for areal data):
 spatial pattern of a certain health condition observed in a set (s1,...,sn) of n areas (such as census tracts or counties)
 in this case, y represent a suitable summary (e.g. the number of cases observed in each area)
Spatial Model within the Bayesian Framework
 The first step is to identify a probability distribution for the observed data (likelihood)
 usually we use a distribution from the Exponential family, indexed by a set of parameters θ accounting for the spatial correlation
 For pointreferenced data:
 the parameters are defined as a latent stationary Gaussian field, a function of some hyperparameters ψ associated with a suitable prior distribution p(ψ)
 this is equivalent to assuming that θ has a multivariate Normal distribution with mean µ=(µ1,...,µn) and spatially structured covariance matrix Σ

For areal data:
 it is possible to reformulate the problem in terms of the neighborhood structure
\Sigma_{ij}=Cov(\theta_i,\theta_j)=\sigma_c^2C(\Delta_{ij})
C(\Delta_{ij})={1\over {\Gamma (\lambda)2^{\lambda1}}} {(\kappa\Delta_{ij})}^\lambda K_\lambda(\kappa\Delta_{ij})
\Delta_{ij}= \s_is_j\
INLA
 We are usually interested in estimating the effect of a set of relevant covariates on some function (typically the mean) of the observed data, while accounting for the spatial or spatiotemporal correlation implied in the model
\eta_i=\alpha+\sum_{m=1}^M\beta_mx_{mi}+\sum_{l=1}^Lf_l(z_{li})
the mean for the ith unit
intercept
the coefficients quantify the effect of some covariates on the response
a collection of functions defined in terms of a set of covariates
 We can vary the form of the functions fl() to accomodate a wide range of models, from standard and hierarchical regression, to spatial and spatiotemporal models
INLA (cont'd)
 The INLA approach exploits the assumptions of the model to produce a numerical approximation to the posteriors of interest, based on the Laplace approximation
 INLA proceeds by first exploring the marginal joint posterior for the hyperparameters in order to locate the mode
 A grid search is then performed and produces a set of relevant points together with a corresponding set of weights to give the approximation to this distribution
 Each marginal posterior can be obtained using interpolation based on the computed values and correcting for skewness
Lab: RINLA
RINLA
 The RINLA package provides an interface to INLA so that models can be fitted using standard R commands
 Assuming a vector of two covariates x1, x2, and a function f() indexed by a third covariate z1:
 The coefficients α, β1, β2 are by default given independent prior Normal distributions with zero mean and small precision
 The term f() is used to specify the structure of the function f(), using the following notation:
 the default choice is model="iid"
 the specification can be used to build standard hierarchical models
 Once the model has been specified, we can run the INLA by:
formula<y~1+x1+x2+f(z1,...)
f(z1,model="...",...)
mod<inla(formula,family="...",data)
git pull
Spatial Areal Data: Suicides in London
 To investigate suicide mortality in n=32 London boroughs in the period 19891993
 Assuming a BesagYorkMollie (BYM) specification, υi is the spatially structured residual, modeled using an intrinsic conditional autoregressive structure
y_i \sim Poisson(\lambda_i)
\lambda_i=\rho_iE_i
\eta_i=log(\rho_i)=\alpha+\upsilon_i+\nu_i
\upsilon_i\upsilon_{j\neq i} \sim Normal(m_i,s_i^2)
m_i={\sum_{j\in N(i) }\upsilon _j \over \#N(i)}
s_i={\sigma^2_{\upsilon}\over \#N(i)}
the number of areas which share boundaries with the ith one (i.e. its neighbors)
\nu_i \sim Normal(0,\sigma_\nu^2)
PHC6194Spring2020Lecture12
By Hui Hu
PHC6194Spring2020Lecture12
Slides for Lecture 12, Spring 2020, PHC6194 Spatial Epidemiology
 260
Loading comments...