April 5: Hierarchical Bayes for mixed modeling

The R interface for WinBUGS/OpenBUGS

 

R users can have all the optimization advantages of WinBUGS or OpenBUGS without having to leave the R interface, using the R2WinBUGS library (if using just WinBUGS) or R2WinBUGS with the BRugs library (if using the more recent OpenBUGS).  You must first have WinBUGS or OpenBUGS installed in the C:/Program Files/ directory (although there is an option for changing this path in R).  The next steps are actually on par or easier than the initialization steps in the BUGS programs themselves (here I'm borrowing from Andrew Gelman's webpage at Columbia: http://www.stat.columbia.edu/~gelman/bugsR/runningbugs.html):

            1) install and attach the BRugs and R2WinBUGS libraries in R

            2) write a BUGS model; this can either be in a standard text file using BUGS model syntax and saved in the current working directory (preferred) or writing an equivalent R function and translating it to BUGS syntax using the write.model function

            3) prepare inputs, including dataset (variables as a list), the function for initial values, and a character vector of which variables to monitor during chaining

            4) call the BUGS program using the bugs function, specifying necessary BUGS arguments

            5) analyze the results using print and plot functions, and accessing the posteriors themselves using attach.bugs


Example with species distribution data

            We can start by building off the BUGS code for a regression with a normal likelihood function for Fraser magnolia.  We begin as usual with a dataset of abundances and absences:

dat = read.csv(file.choose())     #grab the treedata.csv file
dat2 = subset(dat,dat$species=="Magnolia fraseri") #Fraser's magnolia
ro.plots = unique(as.character(dat2$plotID))    #plots with mag fra

u.plots = unique(as.character(dat$plotID)) #all plots
nob.plots = u.plots[is.element(u.plots,ro.plots)==F] #plots without mag fra
dat3 = subset(dat,is.element(as.character(dat$plotID),nob.plots)) #dataset of no mag fra
dat4 = subset(dat3,duplicated(dat3$plotID)==F)  #one row per plot
dat4$cover = 0     #cover of red oak is zero in these plots
mf = na.exclude(rbind(dat2,dat4))     #new dataframe of presences and absences

            Install and attach the R-BUGS interface packages (here we'll use OpenBUGS, but the WinBUGS routine is basically the same, without the need for BRugs or the program="openbugs" argument to the bugs function):

install.packages(c("BRugs","R2WinBUGS"))

library(BRugs)

library(R2WinBUGS)

And before we go any further we need to construct the BUGS model, including deterministic and likelihood (stochastic) components and the distributions of all priors.  Picking up where we left off with the March 8 exercise (see Megan Skrip's pdf), we can start with a three-parameter model of magnolia cover with respect to an intercept, elevation coefficient, and normal-based precision (1/variance).  Save this as a text file called "model1.bug" in your R working directory:

model

{

     a ~ dnorm(0,1.0E-6) #flat prior for the intercept

     b ~ dnorm(0,1.0E-6) #flat prior for the slope

     prec ~ dgamma(0.001,0.001) #flat gamma prior for the precision

     for(j in 1:J) {      #J  = number of observations, j is our looping variable

           mu[j] <- a + b*elev[j] #deterministic function (linear)

           cover[j] ~ dnorm(mu[j],prec) #stochastic (data) model, here normal with precision=prec

     }

}

Back in R, we prepare a list of our variables needed for the above model and define a function for initial values for each chain (replicate run).  Our response variable is cover, our fixed predictor variable is elev, and we also have to define J:

J = nrow(mf)       #total observations, here J=991
cover = mf$cover   #response variable
elev = mf$elev/1000 #predictor variable: again, easier in units of km rather than m
bugsdata = list ("J", "cover", "elev") #data goes into BUGS as a list

The next bit is slightly tricky: for each estimated parameter, we need to provide starting values to initialize each chain, as we do directly in OpenBUGS.  The easiest option is to use the argument inits=NULL in the bugs function, telling OpenBUGS to sample randomly from the prior distribution for starting values.  Or you could define a list of 'n.chains' elements, each being a vector of single starting values for each parameter; subsequent runs (chains) will use the starting values from each separate vector you supply in the list.  The third and most complicated, but probably most effective, option is to generate starting values randomly for each chain based on a function you supply.  You can then specify the number of chains later, and each run will use an independent starting point depending on your function.  For example:
inits = function() list(a=rnorm(1,0,10),b=rnorm(1,0,10),prec=rlnorm(1,1,1))

Note the distribution for precision is a lognormal so that only positive values are produced.  The final preparatory step is to define a character vector for the variables we want to 'save' (monitor) with the Gibbs sampler:

parameters = c("a","b","prec")

We're now ready to execute the chains in OpenBUGS using the bugs function.  There are many possible arguments to bugs; check ?bugs for the full details.  You must include the arguments for the dataset list, the initial values, the monitored parameters, and the model file; for OpenBUGS (rather than WinBUGS) you must include program="openbugs".  You can optionally change the number of chains (n.chains), the total number of iterations in each chain (n.iter), and the number of initial iterations that are thrown away as burn-in (n.burnin):

sim1 = bugs(bugsdata, inits, parameters, "model1.bug", n.chains=3, n.iter=1000, n.burnin=500, program="openbugs")
model is syntactically correct

data loaded

model compiled

Initializing chain 1: initial values loaded but this or another chain contain   uninitialized variables

Initializing chain 2: initial values loaded but this or another chain contain   uninitialized variables

Initializing chain 3: model is initialized

model is already initialized

Sampling has been started ...

500 updates took 0 s

deviance set

monitor set for variable 'a'

monitor set for variable 'b'

monitor set for variable 'prec'

monitor set for variable 'deviance'

500 updates took 1 s

The bugs function executes OpenBUGS in the background without opening up the software itself.  The sim1 object contains basic output for evaluating model fit, convergence of the optimization, and summary statistics of the posterior distributions of each monitored parameter. 
print(sim1)

Inference for Bugs model at "model1.bug", fit using OpenBUGS,

 3 chains, each with 1000 iterations (first 500 discarded)

 n.sims = 1500 iterations saved

           mean  sd   2.5%    25%    50%    75%  97.5% Rhat n.eff

a           2.0 0.2    1.6    1.9    2.0    2.1    2.3    1   260

b          -0.7 0.2   -1.0   -0.8   -0.7   -0.6   -0.4    1   230

prec        0.3 0.0    0.3    0.3    0.3    0.3    0.3    1  1500

deviance 4088.2 2.4 4085.3 4086.4 4087.6 4089.4 4094.1    1  1100

 

For each parameter, n.eff is a crude measure of effective sample size,

and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

 

DIC info (using the rule, pD = Dbar-Dhat)

pD = 3.1 and DIC = 4091.0

DIC is an estimate of expected predictive error (lower deviance is better).

The output is based on 3 chains, each with 500 saved iterations, for a total of 1500 MCMC samples saved to define the posterior distributions of each parameter (above, '1500 iterations saved').  (Note for iterations greater than 2000 bugs 'thins' the output to save memory; you can change how it does this with the n.thin argument.)  The table lists the summary statistics for each estimated parameter (N=1500), including critical quantiles (the 2.5% and 97.5% stats represent a 95% Bayesian credible interval, which states that there is a 95% chance that the real parameter value lies within this range).  The 'Rhat' value is a measure of convergence that takes into account the different starting values for the different chains; this should be 1 or close to it.  'n.eff' is a sample size statistic that takes convergence criteria into account.  Deviance (2*-LL) is automatically monitored as a parameter with its own posterior, and can be used to evaluate model goodness of fit, either via the traditional AIC (here, roughly 4088.2+2*3; 'roughly' because we'd actually need the deviance at the MLEs for each parameter, rather than the mean of its posterior) or via the Deviance Information Criterion (DIC).  DIC is very similar to AIC, except that 1) deviance is calculated slightly differently, using the mean of the posterior distribution rather than the likelihood at the MLEs; and 2) it uses the 'estimated' number of parameters (pD, above), which is usually not an integer.  The need for 'estimated' parameters for goodness of fit tests in HB models results from the conditional nature of the optimization, the existence of priors, and (often) random effects terms.  With symmetrical posteriors and flat priors in simple cases, AIC and DIC converge.  But for most HB applications DIC is a more robust way to compare models, which is why it is built into OpenBUGS.  You can access the summary table itself using sim1$summary.  A summary plot of these results can be obtained with:
plot(sim1)

Convergence statistics for each parameter are on the left, and parameter summaries are on the right.  This is often of significantly less utility than the printed table.  Fortunately you have access to the entire output, including all saved iterations of each fitted parameter, with the attach.bugs command:
attach.bugs(sim1)

The posterior distributions can now be accessed with calls to each parameter, which are separate vectors of length equal to the number of saved iterations (here, 1500).  For example, we can examine the posterior of the elevation coefficient and plot the Bayesian credible interval:
plot(density(b))

abline(v=quantile(b,probs=c(.025,.975)),col="red",lwd=2)

Consistent with our above summary table, the interval is between -0.98 and -0.39; actually the above table rounds up, and the more precise values are:
sim1$summary

                 mean         sd         2.5%          25%          50%          75%        97.5%     Rhat n.eff

a           1.9622642 0.16143985    1.6445501    1.8550253    1.9601085    2.0692011    2.2865161 1.007875   260

b          -0.6823751 0.15005087   -0.9782953   -0.7877308   -0.6857629   -0.5753468   -0.3911635 1.008950   230

prec        0.2767917 0.01299819    0.2528816    0.2675235    0.2763662    0.2854939    0.3024640 1.000860  1500

deviance 4088.1885700 2.41943098 4085.3293335 4086.3523560 4087.5738525 4089.4056396 4094.1349608 1.001863  1100

Inference from posteriors is a breeze, accommodating a full range of statistical questions that are difficult with frequentist techniques.  For example, we can calculate odds of a parameter being within certain bounds directly, such as, what is the probability that the rate of magnolia cover class decline with elevation is greater than 1 class per km?

cF = ecdf(b)  #empirical cumulative distribution function

cF(-1)         #proportion of values than are less than -1 class per km

[1] 0.01866667
Very unlikely.  What is the probability the value is between -1 and -0.5?
cF(-0.5) - cF(-1)
[1] 0.86

Extremely likely.  Just to check our function is working, what is the probability of the value between -0.978 and -0.391?

cF(-0.391) - cF(-0.978)
[1] 0.9493333
Hopefully this is 95%, because these values were our credible range limits.  The function ecdf can be generally used to make posterior-based inferences on parameter estimates.

            Before we attempt a more complicated model note that we could have written our BUGS model as an R function, and then used the write.model function to translate this to OpenBUGS mode file.  Here's the equivalent R function:

bugsmodel <- function() {

     a ~ dnorm(0,1.0E-6)

     b ~ dnorm(0,1.0E-6)

     prec ~ dgamma(0.001,0.001)

     for(j in 1:J) {

          mu[j] <- a + b*elev[j]

          cover[j] ~ dnorm(mu[j],prec)

     }
}
write.model(bugsmodel,con="model2.bug")

sim2 = bugs(bugsdata, inits, parameters, "model2.bug", n.chains=3, n.iter=1000, n.burnin=500, program="openbugs")

You should get the same results as using model1.bug.  The annoying bit is that you have to use the OpenBUGS syntax ('<-' and '~' instead of '='), so this method is basically equivalent to working in a separate text file in the R directory.

            Let's make our stochastic function more appropriate for the cover class response variable by changing our normal likelihood function to a Poisson process, using a log link function.  Save this as text file called 'model3.bug':

model

{

     a ~ dnorm(0,1.0E-6) #flat prior for the intercept

     b ~ dnorm(0,1.0E-6) #flat prior for the slope

     for(j in 1:J) {    #J  = number of observations, j is our looping variable

          log(mu[j]) <- a + b*elev[j] #deterministic function with log link

          cover[j] ~ dpois(mu[j]) #poisson likelihood

     }

}
As we've lost one parameter (precision, not used by the Poisson process where variance and mean are equal), we redefine our initial values and monitored parameters before calling OpenBUGS:
parameters = c("a","b")

inits = function() list(a=rnorm(1,0,10),b=rnorm(1,0,10))

sim3 = bugs(bugsdata, inits, parameters, "model3.bug", n.chains=3, n.iter=1000, n.burnin=500, program="openbugs")

print(sim3)

Inference for Bugs model at "model3.bug", fit using OpenBUGS,

 3 chains, each with 1000 iterations (first 500 discarded)

 n.sims = 1500 iterations saved

           mean  sd   2.5%    25%    50%    75%  97.5% Rhat n.eff

a           0.8 0.1    0.6    0.7    0.8    0.8    0.9    1   990

b          -0.6 0.1   -0.7   -0.6   -0.6   -0.5   -0.4    1   960

deviance 3793.7 1.9 3791.8 3792.3 3793.1 3794.5 3798.7    1  1500

 

For each parameter, n.eff is a crude measure of effective sample size,

and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

 

DIC info (using the rule, pD = Dbar-Dhat)

pD = 1.9 and DIC = 3796.0

DIC is an estimate of expected predictive error (lower deviance is better).

The model is significantly better (compare DICs of 3796 and 4091), and we might be interested in how this changed our estimate of the slope coefficient.  We can save separate copies of the 'b' vector from each run:
attach.bugs(sim3)
b3 = b
attach.bugs(sim1)
b1 = b

plot(density(b3),col="red",xlim=c(-1,0))
lines(density(b1),col="blue")

Our credible interval for b has shrunk considerably with a more appropriate error structure. 

 

Adding a hierarchy

            Up to this point our model is not strictly hierarchical, because we are assuming that all observations have the same error structure.  As an example of adding a random effect based on a grouping variable, we can break up the observations into different groups based on which project each plot is associated with.  The 'plotID' variable contains this information in the its first component:

table(as.factor(substr(mf$plotID,1,4)))

ATBN ATBP DELL DLAP DSLM FRID MILL PITT UFRL ULRY WHCA WISR

  76   13   32    5   69   54   25   43  348  208  105   13

The dataset is actually a conglomeration of 12 vegetation studies conducted after 1975.  Studies were conducted with different methods and in different areas of the park, so it is likely that the abundance of any particular species varies with respect to project type in some unknown fashion.  To create a numeric grouping variable called 'project' we add to the mf dataset:
project = factor(substr(mf$plotID,1,4))

mf$project = as.numeric(project)

We now have to decide how such a grouping variable will contribute to the variance in our model construction.  We expect 'project' to add random variation to our expected cover value, with the difference among projects determined by a random draw from a normal distribution of mean=0 and unknown variance.  This added variance should appear in the regression that determines the expected value for the Poisson likelihood.  For example (“model4.bug”):
model

{

     a ~ dnorm(0,1.0E-6) #flat prior for the intercept

     b ~ dnorm(0,1.0E-6) #flat prior for the slope

     for(j in 1:12) {   #loop over each project (N=12)

          d[j] ~ dnorm(0,prec1)   #projects differ according to a random draw from a normal distribution with precision=prec1

     }

     prec1 ~ dgamma(0.001,0.001)  #flat prior for among-project precision

     for(j in 1:J) {    #J  = number of observations, j is our looping variable

          log(mu[j]) <- a + b*elev[j] + d[project[j]] #add project-level variance

          cover[j] ~ dpois(mu[j]) #poisson likelihood #no change in likelihood function

     }

}

project = mf$project

bugsdata = list ("J", "cover", "elev", "project")    #add project to our data list

parameters = c("a","b","prec1")   #add prec1 as a monitored parameter

inits = function() list(a=rnorm(1,0,10),b=rnorm(1,0,10),prec1=rlnorm(1,1,1))

sim4 = bugs(bugsdata, inits, parameters, "model4.bug", n.chains=3, n.iter=1000, n.burnin=500, program="openbugs")

Inference for Bugs model at "model4.bug", fit using OpenBUGS,

 3 chains, each with 1000 iterations (first 500 discarded)

 n.sims = 1500 iterations saved

           mean  sd   2.5%    25%    50%    75%  97.5% Rhat n.eff

a           8.8 6.2   -0.1    0.4   12.0   14.1   14.6 50.2     3

b          -0.2 0.1   -0.4   -0.3   -0.2   -0.2    0.0  1.1    47

prec1       0.4 0.6    0.0    0.0    0.0    0.7    1.8  9.8     3

deviance 3543.3 5.6 3534.7 3539.2 3542.7 3546.6 3556.0  1.2    15

 

For each parameter, n.eff is a crude measure of effective sample size,

and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

 

DIC info (using the rule, pD = Dbar-Dhat)

pD = 13.5 and DIC = 3557.0

DIC is an estimate of expected predictive error (lower deviance is better).

Our deviance (and DIC) has decreased markedly but we have convergence issues (Rhat) with a and prec1 parameters.  I increase the number of iterations to 50,000 (WARNING: this took about ~20  minutes on my machine):
sim4 = bugs(bugsdata, inits, parameters, "model4.bug", n.chains=3, n.iter=50000, n.burnin=500, program="openbugs")

print(sim4)

Inference for Bugs model at "model4.bug", fit using OpenBUGS,

 3 chains, each with 50000 iterations (first 500 discarded), n.thin = 148

 n.sims = 1002 iterations saved

           mean  sd   2.5%    25%    50%    75%  97.5% Rhat n.eff

a           0.7 2.8   -0.9   -0.2    0.0    0.3   12.1  1.4    16

b          -0.2 0.1   -0.4   -0.3   -0.2   -0.2   -0.1  1.0   270

prec1       0.8 0.5    0.0    0.5    0.7    1.1    2.0  1.3    17

deviance 3544.8 5.5 3535.9 3540.9 3543.8 3548.2 3556.8  1.0   210

 

For each parameter, n.eff is a crude measure of effective sample size,

and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

 

DIC info (using the rule, pD = Dbar-Dhat)

pD = 13.3 and DIC = 3558.0

DIC is an estimate of expected predictive error (lower deviance is better).

The convergence is much improved (although not perfect), and the DIC is about the same but with better parameter estimates.  We can again ask about the elevation coefficient posterior:
attach.bugs(sim4)

b4 = b

plot(density(b4),col="red",xlim=c(-1,0))
lines(density(b3),col="blue")

The elevation coefficient is very sensitive to our addition of a random effect based on project. 

 

Hierarchical logistic regression
            Let’s continue model building for Fraser magnolia by demonstrating a model with a binary (presence-absence) response variable (‘pres’) and a submodel for imperfect species detection.  We’ll add several new predictor variables to the mix, and define their coefficients with a vector that makes specifying priors more convenient.  For the sake of optimization speed we’ll ignore the project-based random effect, but the above lines for the random effect could be inserted here as well.  Instead we’ll suppose that our response variable (presence) results from both our process model (the regression based on environmental variables) and an additional submodel that considers detection probability (observation error) as a function of plot size (the bigger the quadrat, the more likely we are to detect the species).

In regression cases like this, McCarthy (2007, Bayesian Methods for Ecology) suggests the (MCMC) optimization works more efficiently if the predictor variables are used after subtracting their means; if the variable is untransformed this has no effect on the coefficient value (except for the intercept, which can be back-transformed if necessary for predictions).  We’ll do this in R before feeding the data to OpenBUGS.

pres = as.numeric(mf$cover>0)     #binary response

elev = mf$elev-mean(mf$elev) #substract mean for elev predictor values

tci = mf$tci-mean(mf$tci)    #ditto tci

strdst = log(mf$streamdist+1)-mean(log(mf$streamdist+1)) #same for streamdist, but log-transform first
beers = mf$beers-mean(mf$beers)   #ditto radiation
plotsize = log(mf$plotsize)-mean(log(mf$plotsize))   #ditto plot size  

J = nrow(mf)       #total observations, here J=991  

bugsdata = list ("J","pres","elev","tci","strdst","beers","plotsize")     #data as list

We’ve already constructed BUGS models that involve assigning a probability to a cover value based on predictor variables (e.g., the Poisson model above).  Adding detection probability to the mix assumes we have independent information on the probability of observing a species given it occurs in a particular location.  Here we can use variation in plot size (which we assume varies independently of the predictor variables) in our model of detection probability and the environmental variables as our model of occurrence probability.  We record an occurrence as the independent outcome of these two probabilities (detection x occurrence).  This also serves to illustrate how we can break down what may seem to be a complex model into easily specified parts, and have the conditional parameter optimization via MCMC handle the difficult task of keeping track of the conditional relationships.  This is “model5.bug”:

model

{

     for(j in 1:J) {    #J  = number of observations

          true.occ[j] ~ dbern(prob[j]) #prob of true occupancy (latent variable)

          logit(prob[j]) <- beta[1] + beta[2]*elev[j] + beta[3]*(elev[j]*elev[j]) + beta[4]*tci[j] + beta[5]*strdst[j] + beta[6]*beers[j] #’real’ occurrence model with a logit link

          logit(detect[j]) <- d1 + d2*plotsize[j]    #prob of detection model based on plot size with logit link

          p1[j] <- true.occ[j]*detect[j]

          pres[j] ~ dbern(p1[j]) #observed presence is a binomial (Bernoulli, N=1) probability of P(occurrence) * P(detection)

     }

     #prior distributions

     d1 ~ dnorm(0,1.0E-6)    #flat prior for intercept of detection model

     d2 ~ dnorm(0,1.0E-6)    #flat prior for slope of detection model

     for(j in 1:6) {

     beta[j] ~ dnorm(0,1.0E-6)#flat priors for each regression coefficient

     }

     #derived (and monitored) variable of number of true occurrences

     actual.occ <- sum(true.occ[])

}

Back in R we define our monitored parameters and our initialization conditions before executing the OpenBUGS code. Note the initial values for the ‘beta’ coefficients can be specified as a vector instead of single elements.

parameters = c("beta","d1","d2","actual.occ")   #monitored parameters

inits = function() list(beta=rnorm(6,0,10),d1=rnorm(1,0,10),d2=rlnorm(1,1,1))

sim5 = bugs(bugsdata, inits, parameters, "model5.bug", n.chains=1, n.iter=100,   n.burnin=50, program="openbugs")

The model also illustrates how we can establish inference on a ‘latent’ (unobserved) variable that is nonetheless of interest, here being the ‘real’ occurrences (those we observed, plus those we didn’t observe because of imperfect detection: “actual.occ”).