Integrated Nested Laplace Approximation

Hui Hu Ph.D.

Department of Epidemiology

College of Public Health and Health Professions & College of Medicine

April 8, 2020

Integrated Nested Laplace Approximation


Integrated Nested Laplace Approximation

Advantages of the Bayesian Approach

  • The specification of prior distributions allows the formal inclusion of information that can be obtained through previous studies or from expert opinion
  • The posterior probability that a parameter does/does not exceed a certain threshold is easily obtained from the posterior distribution, providing a more intuitive and interpretable quantity than a frequentist p-value
  • Easy to specify a hierarchical structure on the data and parameters

Bayesian Approach for Epidemiological Data

  • Epidemiological data are often characterized by a spatial and/or temporal structure which needs to be taken into account in the inferential process
    -  if the data consists of aggregated counts of outcomes and covariates, typically disease mapping and ecological regression can be specified
    -  if the outcome or risk factors data are observed at point locations, then geostatistical models are considered as suitable representations of the problem
  • Both models can be specified in a Bayesian framework by simply extending the concept of hierarchical structure
    -  allowing to account for similarities based on the neighborhood (for areal data) or on the distance (for point-reference data)

Bayesian Inference in Spatial Epidemiological Studies

  • Bayesian inference has become very popular in spatial analyses in recent years
    -  due to the availability of computation methods to tackle fitting of spatial models
    -  e.g. Besag, York, and Mollie (1991) proposed a method to fit a spatial model using MCMC, which has been extensively used and extended to consider different types of fixed and random effects of spatial and spatio-temporal analysis
  • Fitting these models has been possible because of the availability of different computational techniques, e.g. MCMC
    -  however, for large models or big data sets, MCMC can be tedious
    -  reaching the required number of samples (burn-in period) can take a long time, and autocorrelation may arise and an increased number of iterations may be required

Approximation of Posterior Distribution

  • Posterior distributions may be approximated in some way
    -  however, most models are highly multivariate and approximating the full posterior distribution may not be possible in practice
  • The integrated nested Laplace approximation (INLA) approach has been developed as a computationally efficient alternative to MCMC
  • INLA focuses on the posterior marginals for latent Gaussian models
    -  although these models may seem rather restricted, they appear in a fair number of fields and are usually enough to handle most of the problems in spatial analyses
    -  ranging from generalized linear mixed to spatial and spatio-temporal models
  • INLA can also be combined with the Stochastic Partial Differential Equation (SPDE) approach to implement spatial and spatio-temporal models for point-reference data

Spatial Data

  • Spatial data are defined as realizations of a stochastic process indexed by space

  • The actual data can be represented by a collection of observations

  • The set {s1,} indicates the spatial units at which the measurements are taken
  • Depending on D being a spatial surface or a countable collection of 2-dimensional spatial units, the problem can be specified as a spatially continuous or discrete random process, respectively 
Y(s)\equiv \{y(s),s\in D \}
y\{ y(s_1),...,y(s_n)\}


  • Spatially continuous random process (for point-reference data):
    -  a collection of air pollutant measurements obtained by monitors located in the set (s1, of n points
    -  in this case, y is a realization of the air pollution process that changes continuously in space
  • Spatially discrete random process (for areal data):
    -  spatial pattern of a certain health condition observed in a set (s1,...,sn) of n areas (such as census tracts or counties)
    -  in this case, y represent a suitable summary (e.g. the number of cases observed in each area)

Spatial Model within the Bayesian Framework

  • The first step is to identify a probability distribution for the observed data (likelihood)
    -  usually we use a distribution from the Exponential family, indexed by a set of parameters θ accounting for the spatial correlation
  • For point-referenced data:
    -  the parameters are defined as a latent stationary Gaussian field, a function of some hyper-parameters ψ associated with a suitable prior distribution p(ψ)
    -  this is equivalent to assuming that θ has a multivariate Normal distribution with mean µ=(µ1,...,µn) and spatially structured covariance matrix Σ

  • For areal data:
    -  it is possible to reformulate the problem in terms of the neighborhood structure
C(\Delta_{ij})={1\over {\Gamma (\lambda)2^{\lambda-1}}} {(\kappa\Delta_{ij})}^\lambda K_\lambda(\kappa\Delta_{ij})
\Delta_{ij}= \|s_i-s_j\|


  • We are usually interested in estimating the effect of a set of relevant covariates on some function (typically the mean) of the observed data, while accounting for the spatial or spatio-temporal correlation implied in the model

the mean for the i-th unit


the coefficients quantify the effect of some covariates on the response

a collection of functions defined in terms of a set of covariates

  • We can vary the form of the functions fl() to accomodate a wide range of models, from standard and hierarchical regression, to spatial and spatio-temporal models

INLA (cont'd)

  • The INLA approach exploits the assumptions of the model to produce a numerical approximation to the posteriors of interest, based on the Laplace approximation
  • INLA proceeds by first exploring the marginal joint posterior for the hyper-parameters in order to locate the mode
  • A grid search is then performed and produces a set of relevant points together with a corresponding set of weights to give the approximation to this distribution
  • Each marginal posterior can be obtained using interpolation based on the computed values and correcting for skewness



  • The R-INLA package provides an interface to INLA so that models can be fitted using standard R commands
  • Assuming a vector of two covariates x1, x2, and a function f() indexed by a third covariate z1:

  • The coefficients α, β1, β2 are by default given independent prior Normal distributions with zero mean and small precision
  • The term f() is used to specify the structure of the function f(), using the following notation:

    -  the default choice is model="iid"
    -  the specification can be used to build standard hierarchical models
  • Once the model has been specified, we can run the INLA by:

git pull

Spatial Areal Data: Suicides in London

  • To investigate suicide mortality in n=32 London boroughs in the period 1989-1993

  • Assuming a Besag-York-Mollie (BYM) specification, υi is the spatially structured residual, modeled using an intrinsic conditional autoregressive structure
y_i \sim Poisson(\lambda_i)
\upsilon_i|\upsilon_{j\neq i} \sim Normal(m_i,s_i^2)
m_i={\sum_{j\in N(i) }\upsilon _j \over \#N(i)}
s_i={\sigma^2_{\upsilon}\over \#N(i)}

the number of areas which share boundaries with the i-th one (i.e. its neighbors)

\nu_i \sim Normal(0,\sigma_\nu^2)


By Hui Hu


Slides for Lecture 12, Spring 2020, PHC6194 Spatial Epidemiology

  • 260
Loading comments...

More from Hui Hu