Bayesian Analyses
Hui Hu Ph.D.
Department of Epidemiology
College of Public Health and Health Professions & College of Medicine
February 18, 2020
Introduction to Bayesian Statistical Perspectives
Bayesian Hierarchical Model
Lab: JAGS
Introduction to Bayesian Statistical Perspectives
Statistical Perspectives
 Statistics:
 the use of analytical tools to uncover relationships in data
 different foundations in philosophy of inference
 different perspectives on probability: constant of nature vs. subjective degree of belief
 The most common statistical perspectives:
 Frequentist: Pr(yθ), probability of seeing this data, given the parameter
 Bayesian: Pr(θy), probability of the parameter, given this data
Data
Prior Belief
Decision
Frequentist
Bayesian
Frequentist vs. Bayesian
Frequentist 

Data are repeatable random sample: there is a frequency 
Underlying parameters remain constant during this repeatable process 
Parameters are fixed 
Bayesian 

Data are observed from the realized sample 
Parameters are unknown and described probabilistically 
Data are fixed 
Bayes' Theorem
 2% of women at age 40 who participate in routine screening have breast cancer
 80% of women with breast cancer will get positive mammographies
 10% of women without breast cancer will also get positive mammographies
 Question:
 A woman in this age group had a positive mammography in a routine screening.
What is the probability that she actually has breast cancer?
 2% of women at age 40 who participate in routine screening have breast cancer
 80% of women with breast cancer will get positive mammographies
 10% of women without breast cancer will also get positive mammographies
 A woman in this age group had a positive mammography in a routine screening.
What is the probability that she actually has breast cancer?
Bayes' Theorem
Posterior:
The probability of parameter given the data is collected
Likelihood:
The probability of collecting this data given the parameter
Prior:
The probability of the parameter before collecting data
Marginal Likelihood:
The probability of collecting this data under all possible parameters
Posterior:
The probability of parameter given the data is collected
Likelihood:
The probability of collecting this data given the parameter
Prior:
The probability of the parameter before collecting data
Marginal Likelihood:
The probability of collecting this data under all possible parameters
 Prior represents our subjective beliefs, via a probability statement, about likely values of unobserved parameter before we have observed data
 Types:
 noninformative: minimal prior information (e.g. all values equally likely)
 clinical: come from statistician/knowledgeable scientist interaction
 skeptical: quantify a large effect as unlikely
 enthusiastic: create good chance to observe effect
Posterior:
The probability of parameter given the data is collected
Likelihood:
The probability of collecting this data given the parameter
Prior:
The probability of the parameter before collecting data
Marginal Likelihood:
The probability of collecting this data under all possible parameters
 Likelihood is used when describing a function of a parameter given an outcome
 e.g. if a coin is flipped 10 times and it has landed headsup 10 times, what is the likelihood that the coin is fair?
 The likelihood for data {yi}, i=1, 2, ..., m, is defined as
Posterior:
The probability of parameter given the data is collected
Likelihood:
The probability of collecting this data given the parameter
Prior:
The probability of the parameter before collecting data
Marginal Likelihood:
The probability of collecting this data under all possible parameters
 Given a set of independent identically distributed data points
where yi~p(yiθ) according to some probability distribution parameterized by θ, and θ itself is a random variable described by a distribution:
 The marginal likelihood in general asks what the probability p(yα) is, where θ has been marginalized out
 The probability of the data under all possible parameters
 it is a constant value given data
The posterior is proportional to the likelihood times the prior
The Sunrise Problem
 The sunrise problem was introduced by Price in the comments to the article that introduced Bayes' theorem
 It concerns the probability that the sun will rise tomorrow, and the evaluation and updating of a belief (prior)
 Imagine a newborn who observed the sun on his first day
 after the sun sets, the newborn has uncertainty as to whether or not he will see the sun again
 we can represent this uncertainty with a Beta distribution
 in this example, the Beta distribution is used as a prior probability distribution, which is an expression of belief before seeing more data
The Sunrise Problem (cont'd)
 A Beta distribution has two parameters, α and β
 a Beta distribution that is specified as B(α=1,β=1) is a uniform distribution between zero and one
 The α and β are easily understandable in terms of successes and failures of an event, where α=successes+1 and β=failures+1
 therefore, zero successes and failures may be represented with a B(α=1,β=1) distribution
 the expection of a Beta distribution is α/(α+β)
 The Beta distribution is also convenient because it is selfconjugate, meaning that when it is used as a prior distribution, the posterior distribution is also a Beta distribution
The Sunrise Problem (cont'd)
 After observing one sunrise (one success):
 the prior mean has been updated from 0.5 to the posterior mean of 0.67
The Sunrise Problem (cont'd)
 After observing two sunrises:
 the prior mean has been updated from 0.5 to the posterior mean of 0.75
The Sunrise Problem (cont'd)
 After observing ten sunrises:
 the prior mean has been updated from 0.5 to the posterior mean of 0.92
The Sunrise Problem (cont'd)
 If this example was calculated for the newborn on his 80th birthday:
 the prior mean has been updated from 0.5 to the posterior mean of 0.9999658
The Sunrise Problem (cont'd)
 There are many things about this example that have been presented arbitrarily:
 the newborn may have been optimistic or pessimistic in his uncertainty, and a different prior probability distribution may have been more appropriate
 the subjective interpretation of probability allows for the prior probability distribution to differ for each person
 although a different prior would affect the posterior, the difference decreases as more data is observed
 characterizing the prior or posterior with the mean is also an arbitrary choice
 In Bayesian inference, only the simplest examples may be demonstrated analytically. Usually, numerical approximation such as Markov chain Monte Carlo (MCMC) must be used
Conjugate Pairs
 When the prior and the posterior are from the same family of distribution, we call them conjugate pairs
 Since posterior is proportional to the prior times the likelihood, we need to pick a right combination of likelihood distribution and prior distribution to get a conjugate pair
 Conjugate pairs are important
 we cannot explicitly write out the posterior distribution for nonconjugate pairs
Conjugate Pairs (cont.d)
 An example of conjugate pair with a Binomial likelihood and a Beta prior
Bayesian Hierarchical Model
Bayesian Hierarchical Model
 A simple example of a hierarchical model that is commonly found in spatial epidemiological studies is where the data likelihood is Poisson and there is a common relative risk parameter with a single gamma prior distribution
where g(θ) is a Gamma distribution with parameters α and β.
 A compact notation for this model is:

α and β controls the form of the prior distribution g(θ)
 these parameters can have assumed values, but more usually, we don't have a strong belief in the prior parameters values
 alternatively, as parameters within models are regarded as stochastic, then these parameters must also have distributions, and these distributions are known as hyperprior distributions, these parameters are known as hyperparameters
Bayesian Hierarchical Model (cont.d)
 In this PoissonGamma example, there is a two level hierarchy:
 θ has a G(α, β) distribution at the first level of the hierarchy
 α can have a hyperprior distribution (hα), and β can have a hyperprior distribution (hβ)
 This can be written as:
Computational Tools
 In Bayesian analyses, integration is the principle inferential operation, as opposed to optimization in classical statistics
 Historically, the need to evaluate integrals was a major stumbling block for the application of Bayesian methods, severely restricts the type of models that could be implemented
 Around 28 years ago, a numerical technique known as Markov chain Monte Carlo (MCMC) was proposed by a Gelfand and Smith (1990)
 MCMC is a general method that simultaneously solves inference of major quantities in Bayesian analyses
 allows to concentrate on modelling: use models that you believe represent the true dependence structures in the data, rather than those that are simple to compute
MCMC
 All the information we need is contained in the posterior distribution, however, it may not be quantifiable as a standard distribution (unless we pick specific prior and likelihood combinations to create conjugate pair)
 MCMC simulation approximates the true posterior density by using a bag of samples drawn from the density
 the initial set of T samples will be discarded since when few simulations were done, the samples drawn are can often be far from the posterior distribution
 the time (iteration number) T is known as the burnin
 Two most general procedures for MCMC simulation from a target distribution:
 the Gibbs sampler
 the MetropolisHastings algorithm: allows us to draw samples from nonconjugate pairs
MetropolisHastings Algorithm
 The MetropolisHastings algorithm can be embedded within the Gibbs sampler and generate samples from these unknown distributions
 Although we can use this algorithm to take samples from any distributions, we usually don't do that when we have a known posterior distribution since the sample generated by this algorithm may introduce autocorrelations in Gibbs sampler
 We do not cover the technical details of this algorithm since it has been implemented in JAGS
 but for more complex model which cannot be implemented in JAGS or needs further optimizations, we do need to program the Gibbs sampler and implement the MetropolisHastings algorithm
Lab: JAGS
BUGS and JAGS
 BUGS: Bayesian inference Using Gibbs Sampling
 JAGS (Just Another Gibbs Sampler)
Syntax
Syntax (cont.d)
Syntax Differences in BUGS/JAGS Compared with R
 Order not important because they are compiled simultaneously
 [],[,],[,,]needed when defining arrays
 Matrix filled rowwise
 Defining vector/matrix prior to filling the elements is not necessary
Model Construction
 Likelihood:
for (i in 1:n){
x[i]~dexp(l[i])
l[i]<mu[i]*x[i]
}  Prior:
for (j in 1:m){
mu[j]~dexp(3)
}
git clone https://github.com/benhhu/RBayesian.git
Bayesian Analyses  Guest Lecture
By Hui Hu