**April 5: Hierarchical Bayes for mixed modeling**

*The R interface for
WinBUGS/OpenBUGS*

* *

R users can have all the optimization advantages of WinBUGS or OpenBUGS without having to leave the R interface, using the R2WinBUGS library (if using just WinBUGS) or R2WinBUGS with the BRugs library (if using the more recent OpenBUGS). You must first have WinBUGS or OpenBUGS installed in the C:/Program Files/ directory (although there is an option for changing this path in R). The next steps are actually on par or easier than the initialization steps in the BUGS programs themselves (here I'm borrowing from Andrew Gelman's webpage at Columbia: http://www.stat.columbia.edu/~gelman/bugsR/runningbugs.html):

1) install and attach the BRugs and R2WinBUGS libraries in R

2) write a
BUGS model; this can either be in a standard text file using BUGS model syntax
and saved in the current working directory (preferred) or writing an equivalent
R function and translating it to BUGS syntax using the **write.model**
function

3) prepare inputs, including dataset (variables as a list), the function for initial values, and a character vector of which variables to monitor during chaining

4) call
the BUGS program using the **bugs** function, specifying necessary BUGS
arguments

5) analyze
the results using print and plot functions, and accessing the posteriors
themselves using **attach.bugs**

Example with species distribution data

We can start by building off the BUGS code for a regression with a normal likelihood function for Fraser magnolia. We begin as usual with a dataset of abundances and absences:

dat =
read.csv(file.choose()) #grab the treedata.csv
file

dat2 = subset(dat,dat$species=="Magnolia
fraseri") #Fraser's magnolia

ro.plots =
unique(as.character(dat2$plotID)) #plots with mag fra

u.plots = unique(as.character(dat$plotID)) #all plots

nob.plots = u.plots[is.element(u.plots,ro.plots)==F] #plots without
mag fra

dat3 = subset(dat,is.element(as.character(dat$plotID),nob.plots))
#dataset of no mag fra

dat4 =
subset(dat3,duplicated(dat3$plotID)==F) #one row per plot

dat4$cover = 0 #cover
of red oak is zero in these plots

mf =
na.exclude(rbind(dat2,dat4)) #new dataframe of
presences and absences

Install and attach the R-BUGS interface
packages (here we'll use OpenBUGS, but the WinBUGS routine is basically the
same, without the need for BRugs or the *program="openbugs"*
argument to the **bugs** function):

install.packages(c("BRugs","R2WinBUGS"))

library(BRugs)

library(R2WinBUGS)

And before we go any further we need to construct the BUGS model, including deterministic and likelihood (stochastic) components and the distributions of all priors. Picking up where we left off with the March 8 exercise (see Megan Skrip's pdf), we can start with a three-parameter model of magnolia cover with respect to an intercept, elevation coefficient, and normal-based precision (1/variance). Save this as a text file called "model1.bug" in your R working directory:

model

{

a ~ dnorm(0,1.0E-6) #flat prior for the intercept

b ~ dnorm(0,1.0E-6) #flat prior for the slope

prec ~ dgamma(0.001,0.001) #flat gamma prior for the precision

for(j in 1:J) { #J = number of observations, j is our looping variable

mu[j] <- a + b*elev[j] #deterministic function (linear)

cover[j] ~ dnorm(mu[j],prec) #stochastic (data) model, here normal with precision=prec

}

}

Back in R, we prepare a list of our variables needed for the above model and define a function for initial values for each chain (replicate run). Our response variable is cover, our fixed predictor variable is elev, and we also have to define J:

J = nrow(mf) #total
observations, here J=991

cover = mf$cover #response variable

elev = mf$elev/1000 #predictor variable: again,
easier in units of km rather than m

bugsdata = list ("J", "cover",
"elev") #data goes into BUGS as a list

The next bit is slightly tricky: for each estimated
parameter, we need to provide starting values to initialize each chain, as we
do directly in OpenBUGS. The easiest option is to use the argument inits=NULL
in the bugs function, telling OpenBUGS to sample randomly from the prior
distribution for starting values. Or you could define a list of 'n.chains'
elements, each being a vector of single starting values for each parameter;
subsequent runs (chains) will use the starting values from each separate vector
you supply in the list. The third and most complicated, but probably most
effective, option is to generate starting values randomly for each chain based
on a function you supply. You can then specify the number of chains later, and
each run will use an independent starting point depending on your function.
For example:

inits = function() list(a=rnorm(1,0,10),b=rnorm(1,0,10),prec=rlnorm(1,1,1))

Note the distribution for precision is a lognormal so that only positive values are produced. The final preparatory step is to define a character vector for the variables we want to 'save' (monitor) with the Gibbs sampler:

parameters = c("a","b","prec")

We're now ready to execute the chains in OpenBUGS using
the **bugs** function. There are many possible arguments to **bugs**;
check ?bugs for the full details. You must include the arguments for the
dataset list, the initial values, the monitored parameters, and the model file;
for OpenBUGS (rather than WinBUGS) you must include
program="openbugs". You can optionally change the number of chains
(n.chains), the total number of iterations in each chain (n.iter), and the
number of initial iterations that are thrown away as burn-in (n.burnin):

sim1 =
bugs(bugsdata, inits, parameters, "model1.bug", n.chains=3,
n.iter=1000, n.burnin=500, program="openbugs")

model is syntactically correct

data loaded

model compiled

Initializing chain 1: initial values loaded but this or another chain contain uninitialized variables

Initializing chain 2: initial values loaded but this or another chain contain uninitialized variables

Initializing chain 3: model is initialized

model is already initialized

Sampling has been started ...

500 updates took 0 s

deviance set

monitor set for variable 'a'

monitor set for variable 'b'

monitor set for variable 'prec'

monitor set for variable 'deviance'

500 updates took 1 s

The bugs function executes OpenBUGS in the background
without opening up the software itself. The sim1 object contains basic output
for evaluating model fit, convergence of the optimization, and summary
statistics of the posterior distributions of each monitored parameter.

print(sim1)

Inference for Bugs model at "model1.bug", fit using OpenBUGS,

3 chains, each with 1000 iterations (first 500 discarded)

n.sims = 1500 iterations saved

mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff

a 2.0 0.2 1.6 1.9 2.0 2.1 2.3 1 260

b -0.7 0.2 -1.0 -0.8 -0.7 -0.6 -0.4 1 230

prec 0.3 0.0 0.3 0.3 0.3 0.3 0.3 1 1500

deviance 4088.2 2.4 4085.3 4086.4 4087.6 4089.4 4094.1 1 1100

For each parameter, n.eff is a crude measure of effective sample size,

and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = Dbar-Dhat)

pD = 3.1 and DIC = 4091.0

DIC is an estimate of expected predictive error (lower deviance is better).

The output is based on
3 chains, each with 500 saved iterations, for a total of 1500 MCMC samples
saved to define the posterior distributions of each parameter (above, '1500
iterations saved'). (Note for iterations greater than 2000 bugs 'thins' the
output to save memory; you can change how it does this with the *n.thin*
argument.) The table lists the summary statistics for each estimated parameter
(N=1500), including critical quantiles (the 2.5% and 97.5% stats represent a
95% Bayesian credible interval, which states that there is a 95% chance that
the real parameter value lies within this range). The 'Rhat' value is a
measure of convergence that takes into account the different starting values
for the different chains; this should be 1 or close to it. 'n.eff' is a sample
size statistic that takes convergence criteria into account. Deviance (2*-LL)
is automatically monitored as a parameter with its own posterior, and can be
used to evaluate model goodness of fit, either via the traditional AIC (here,
roughly 4088.2+2*3; 'roughly' because we'd actually need the deviance at the
MLEs for each parameter, rather than the mean of its posterior) or via the
Deviance Information Criterion (DIC). DIC is very similar to AIC, except that
1) deviance is calculated slightly differently, using the mean of the posterior
distribution rather than the likelihood at the MLEs; and 2) it uses the
'estimated' number of parameters (pD, above), which is usually not an integer.
The need for 'estimated' parameters for goodness of fit tests in HB models
results from the conditional nature of the optimization, the existence of
priors, and (often) random effects terms. With symmetrical posteriors and flat
priors in simple cases, AIC and DIC converge. But for most HB applications DIC
is a more robust way to compare models, which is why it is built into OpenBUGS.
You can access the summary table itself using sim1$summary. A summary plot of
these results can be obtained with:

plot(sim1)

Convergence statistics
for each parameter are on the left, and parameter summaries are on the right. This
is often of significantly less utility than the printed table. Fortunately you
have access to the entire output, including all saved iterations of each fitted
parameter, with the **attach.bugs** command:

attach.bugs(sim1)

The posterior
distributions can now be accessed with calls to each parameter, which are
separate vectors of length equal to the number of saved iterations (here,
1500). For example, we can examine the posterior of the elevation coefficient
and plot the Bayesian credible interval:

plot(density(b))

abline(v=quantile(b,probs=c(.025,.975)),col="red",lwd=2)

Consistent with our above summary table, the interval is
between -0.98 and -0.39; actually the above table rounds up, and the more
precise values are:

sim1$summary

mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff

a 1.9622642 0.16143985 1.6445501 1.8550253 1.9601085 2.0692011 2.2865161 1.007875 260

b -0.6823751 0.15005087 -0.9782953 -0.7877308 -0.6857629 -0.5753468 -0.3911635 1.008950 230

prec 0.2767917 0.01299819 0.2528816 0.2675235 0.2763662 0.2854939 0.3024640 1.000860 1500

deviance 4088.1885700 2.41943098 4085.3293335 4086.3523560 4087.5738525 4089.4056396 4094.1349608 1.001863 1100

Inference from posteriors is a breeze, accommodating a full range of statistical questions that are difficult with frequentist techniques. For example, we can calculate odds of a parameter being within certain bounds directly, such as, what is the probability that the rate of magnolia cover class decline with elevation is greater than 1 class per km?

cF = ecdf(b) #empirical cumulative distribution function

cF(-1) #proportion of values than are less than -1 class per km

[1] 0.01866667

Very unlikely. What is the probability the value is between -1 and
-0.5?

cF(-0.5) - cF(-1)

[1] 0.86

Extremely likely. Just to check our function is working, what is the probability of the value between -0.978 and -0.391?

cF(-0.391) - cF(-0.978)

[1] 0.9493333

Hopefully this is 95%, because these values were our credible range
limits. The function **ecdf** can be generally used to make posterior-based
inferences on parameter estimates.

Before we
attempt a more complicated model note that we could have written our BUGS model
as an R function, and then used the **write.model** function to translate
this to OpenBUGS mode file. Here's the equivalent R function:

bugsmodel <- function() {

a ~ dnorm(0,1.0E-6)

b ~ dnorm(0,1.0E-6)

prec ~ dgamma(0.001,0.001)

for(j in 1:J) {

mu[j] <- a + b*elev[j]

cover[j] ~ dnorm(mu[j],prec)

}

}

write.model(bugsmodel,con="model2.bug")

sim2 = bugs(bugsdata, inits, parameters, "model2.bug", n.chains=3, n.iter=1000, n.burnin=500, program="openbugs")

You should get the same results as using model1.bug. The annoying bit is that you have to use the OpenBUGS syntax ('<-' and '~' instead of '='), so this method is basically equivalent to working in a separate text file in the R directory.

Let's make our stochastic function more appropriate for the cover class response variable by changing our normal likelihood function to a Poisson process, using a log link function. Save this as text file called 'model3.bug':

model

{

a ~ dnorm(0,1.0E-6) #flat prior for the intercept

b ~ dnorm(0,1.0E-6) #flat prior for the slope

for(j in 1:J) { #J = number of observations, j is our looping variable

log(mu[j]) <- a + b*elev[j] #deterministic function with log link

cover[j] ~ dpois(mu[j]) #poisson likelihood

}

}

As we've lost one parameter (precision, not used by the Poisson process
where variance and mean are equal), we redefine our initial values and
monitored parameters before calling OpenBUGS:

parameters
= c("a","b")

inits = function() list(a=rnorm(1,0,10),b=rnorm(1,0,10))

sim3 = bugs(bugsdata, inits, parameters, "model3.bug", n.chains=3, n.iter=1000, n.burnin=500, program="openbugs")

print(sim3)

Inference for Bugs model at "model3.bug", fit using OpenBUGS,

3 chains, each with 1000 iterations (first 500 discarded)

n.sims = 1500 iterations saved

mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff

a 0.8 0.1 0.6 0.7 0.8 0.8 0.9 1 990

b -0.6 0.1 -0.7 -0.6 -0.6 -0.5 -0.4 1 960

deviance 3793.7 1.9 3791.8 3792.3 3793.1 3794.5 3798.7 1 1500

For each parameter, n.eff is a crude measure of effective sample size,

and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = Dbar-Dhat)

pD = 1.9 and DIC = 3796.0

DIC is an estimate of expected predictive error (lower deviance is better).

The model is significantly better (compare DICs of 3796 and
4091), and we might be interested in how this changed our estimate of the slope
coefficient. We can save separate copies of the 'b' vector from each run:

attach.bugs(sim3)

b3 = b

attach.bugs(sim1)

b1 = b

plot(density(b3),col="red",xlim=c(-1,0))

lines(density(b1),col="blue")

Our credible interval for b has shrunk considerably with a more appropriate error structure.

*Adding a hierarchy*

Up to this point our model is not strictly hierarchical, because we are assuming that all observations have the same error structure. As an example of adding a random effect based on a grouping variable, we can break up the observations into different groups based on which project each plot is associated with. The 'plotID' variable contains this information in the its first component:

table(as.factor(substr(mf$plotID,1,4)))

ATBN ATBP DELL DLAP DSLM FRID MILL PITT UFRL ULRY WHCA WISR

76 13 32 5 69 54 25 43 348 208 105 13

The dataset is actually a conglomeration of 12 vegetation
studies conducted after 1975. Studies were conducted with different methods
and in different areas of the park, so it is likely that the abundance of any
particular species varies with respect to project type in some unknown
fashion. To create a numeric grouping variable called 'project' we add to the
mf dataset:

project = factor(substr(mf$plotID,1,4))

mf$project = as.numeric(project)

We now have to decide how such
a grouping variable will contribute to the variance in our model construction.
We expect 'project' to add random variation to our expected cover value, with
the difference among projects determined by a random draw from a normal
distribution of mean=0 and unknown variance. This added variance should appear
in the regression that determines the expected value for the Poisson
likelihood. For example (“model4.bug”):

model

{

a ~ dnorm(0,1.0E-6) #flat prior for the intercept

b ~ dnorm(0,1.0E-6) #flat prior for the slope

for(j in 1:12) { #loop over each project (N=12)

d[j] ~ dnorm(0,prec1) #projects differ according to a random draw from a normal distribution with precision=prec1

}

prec1 ~ dgamma(0.001,0.001) #flat prior for among-project precision

for(j in 1:J) { #J = number of observations, j is our looping variable

log(mu[j]) <- a + b*elev[j] + d[project[j]] #add project-level variance

cover[j] ~ dpois(mu[j]) #poisson likelihood #no change in likelihood function

}

}

project = mf$project

bugsdata = list ("J", "cover", "elev", "project") #add project to our data list

parameters = c("a","b","prec1") #add prec1 as a monitored parameter

inits = function() list(a=rnorm(1,0,10),b=rnorm(1,0,10),prec1=rlnorm(1,1,1))

sim4 = bugs(bugsdata, inits, parameters, "model4.bug", n.chains=3, n.iter=1000, n.burnin=500, program="openbugs")

Inference for Bugs model at "model4.bug", fit using OpenBUGS,

3 chains, each with 1000 iterations (first 500 discarded)

n.sims = 1500 iterations saved

mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff

a 8.8 6.2 -0.1 0.4 12.0 14.1 14.6 50.2 3

b -0.2 0.1 -0.4 -0.3 -0.2 -0.2 0.0 1.1 47

prec1 0.4 0.6 0.0 0.0 0.0 0.7 1.8 9.8 3

deviance 3543.3 5.6 3534.7 3539.2 3542.7 3546.6 3556.0 1.2 15

For each parameter, n.eff is a crude measure of effective sample size,

and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = Dbar-Dhat)

pD = 13.5 and DIC = 3557.0

DIC is an estimate of expected predictive error (lower deviance is better).

Our deviance (and DIC) has decreased markedly but we have
convergence issues (Rhat) with a and prec1 parameters. I increase the number
of iterations to 50,000 (**WARNING**: this took about ~20 minutes on my machine):

sim4 = bugs(bugsdata, inits, parameters, "model4.bug",
n.chains=3, n.iter=50000, n.burnin=500, program="openbugs")

print(sim4)

Inference for Bugs model at "model4.bug", fit using OpenBUGS,

3 chains, each with 50000 iterations (first 500 discarded), n.thin = 148

n.sims = 1002 iterations saved

mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff

a 0.7 2.8 -0.9 -0.2 0.0 0.3 12.1 1.4 16

b -0.2 0.1 -0.4 -0.3 -0.2 -0.2 -0.1 1.0 270

prec1 0.8 0.5 0.0 0.5 0.7 1.1 2.0 1.3 17

deviance 3544.8 5.5 3535.9 3540.9 3543.8 3548.2 3556.8 1.0 210

For each parameter, n.eff is a crude measure of effective sample size,

and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = Dbar-Dhat)

pD = 13.3 and DIC = 3558.0

DIC is an estimate of expected predictive error (lower deviance is better).

The convergence is much
improved (although not perfect), and the DIC is about the same but with better
parameter estimates. We can again ask about the elevation coefficient
posterior:

attach.bugs(sim4)

b4 = b

plot(density(b4),col="red",xlim=c(-1,0))

lines(density(b3),col="blue")

The elevation coefficient is very sensitive to our addition of a random effect based on project.

*Hierarchical logistic
regression*

Let’s continue model building for Fraser magnolia by demonstrating
a model with a binary (presence-absence) response variable (‘pres’) and a
submodel for imperfect species detection. We’ll add several new predictor
variables to the mix, and define their coefficients with a vector that makes
specifying priors more convenient. For the sake of optimization speed we’ll
ignore the project-based random effect, but the above lines for the random
effect could be inserted here as well. Instead we’ll suppose that our response
variable (presence) results from both our process model (the regression based
on environmental variables) and an additional submodel that considers detection
probability (observation error) as a function of plot size (the bigger the
quadrat, the more likely we are to detect the species).

In regression cases like this,
McCarthy (2007, *Bayesian Methods for Ecology*) suggests the (MCMC)
optimization works more efficiently if the predictor variables are used after
subtracting their means; if the variable is untransformed this has no effect on
the coefficient value (except for the intercept, which can be back-transformed
if necessary for predictions). We’ll do this in R before feeding the data to
OpenBUGS.

pres = as.numeric(mf$cover>0) #binary response

elev = mf$elev-mean(mf$elev) #substract mean for elev predictor values

tci = mf$tci-mean(mf$tci) #ditto tci

strdst
= log(mf$streamdist+1)-mean(log(mf$streamdist+1)) #same for streamdist, but
log-transform first

beers = mf$beers-mean(mf$beers) #ditto radiation

plotsize = log(mf$plotsize)-mean(log(mf$plotsize)) #ditto plot size

J = nrow(mf) #total observations, here J=991

bugsdata = list ("J","pres","elev","tci","strdst","beers","plotsize") #data as list

We’ve already constructed BUGS models that involve
assigning a probability to a cover value based on predictor variables (e.g.,
the Poisson model above). Adding detection probability to the mix assumes we
have independent information on the probability of observing a species *given
it occurs *in a particular location. Here we can use variation in plot size
(which we assume varies independently of the predictor variables) in our model
of detection probability and the environmental variables as our model of
occurrence probability. We record an occurrence as the independent outcome of
these two probabilities (detection x occurrence). This also serves to
illustrate how we can break down what may seem to be a complex model into
easily specified parts, and have the conditional parameter optimization via
MCMC handle the difficult task of keeping track of the conditional
relationships. This is “model5.bug”:

model

{

for(j in 1:J) { #J = number of observations

true.occ[j] ~ dbern(prob[j]) #prob of true occupancy (latent variable)

logit(prob[j]) <- beta[1] + beta[2]*elev[j] + beta[3]*(elev[j]*elev[j]) + beta[4]*tci[j] + beta[5]*strdst[j] + beta[6]*beers[j] #’real’ occurrence model with a logit link

logit(detect[j]) <- d1 + d2*plotsize[j] #prob of detection model based on plot size with logit link

p1[j] <- true.occ[j]*detect[j]

pres[j] ~ dbern(p1[j]) #observed presence is a binomial (Bernoulli, N=1) probability of P(occurrence) * P(detection)

}

#prior distributions

d1 ~ dnorm(0,1.0E-6) #flat prior for intercept of detection model

d2 ~ dnorm(0,1.0E-6) #flat prior for slope of detection model

for(j in 1:6) {

beta[j] ~ dnorm(0,1.0E-6)#flat priors for each regression coefficient

}

#derived (and monitored) variable of number of true occurrences

actual.occ <- sum(true.occ[])

}

Back in R we define our monitored parameters and our initialization conditions before executing the OpenBUGS code. Note the initial values for the ‘beta’ coefficients can be specified as a vector instead of single elements.

parameters = c("beta","d1","d2","actual.occ") #monitored parameters

inits = function() list(beta=rnorm(6,0,10),d1=rnorm(1,0,10),d2=rlnorm(1,1,1))

sim5 = bugs(bugsdata, inits, parameters, "model5.bug", n.chains=1, n.iter=100, n.burnin=50, program="openbugs")

The model also illustrates how we can establish inference on a ‘latent’ (unobserved) variable that is nonetheless of interest, here being the ‘real’ occurrences (those we observed, plus those we didn’t observe because of imperfect detection: “actual.occ”).