Bayesian methods in R and BUGS

Toward Bayesian models

So much has been written on Bayesian vs. non-Bayesian analysis in ecology in the last decade that I will limit our discussion to implementation of Bayesian models in R and leave the 'why or why not' concerns to more able statisticians (highly recommended: McCarthy's 2007 book Bayesian Methods for Ecology, also Clark 2005 Ecology Letters 8:2-14).  Essentially, what a simple Bayesian approach (as opposed to hierarchical Bayes, which we'll approach later) facilitates is the translation of a conditional statement about data (given a hypothesis) to a conditional statement about a hypothesis (given a dataset), using Bayes' theorem.  Rev. Thomas Bayes figured this out in the 18th century: he was interested in calculating inverse probabilities (e.g., given data of heads and tails, what can you tell about the coin?).  In practice, we do this by multiplying our already established likelihood function [P(data|model)] by a so-called prior distribution, which is the suspected probability of the model before we collect new data.  This product of a likelihood and a prior is proportional to the posterior distribution [P(model|data)], which becomes the basis for statistical inference.  Once you've established the posterior distribution, model selection and parameter estimation are relatively easy and based on the more intuitive notion of calculating actual probabilities of different hypotheses.  The difficulties of Bayesian modeling are how you get to the posterior: you need to specify a prior (sometimes admitting that you know nothing about the model initially: a so-called uninformative or flat prior) and then you need to estimate the shape of the posterior distribution.  The analytical problem in this is that all the posterior probabilities must sum to 1 (the prior and posterior distributions are probability density functions), and thus the product of the likelihood function and the prior distribution must be standardized by a sum of probabilities of all possible outcomes.  In most cases we can't figure this out analytically and so must estimate the shape of the posterior using a numerical random sampling technique (Markov Chain Monte Carlo simulation, or MCMC). In many ways this is similar to the optimization routines we needed to calculate MLEs: like the simulated annealing algorithm, MCMC works sequentially to find new parameter values using random jumps through parameter space. Of the several common MCMC variants used to 'solve' (actually sample from) the posterior distribution, perhaps the most popular is called Gibbs sampling.  This algorithm (often called the Gibbs sampler) is particularly good for estimating the value of one parameter conditional on values of a host of other parameters, which is especially effective for fitting hierarchical Bayesian models.  Unfortunately, the most effective software that uses the Gibbs sampler, BUGS (Bayesian inference Using Gibbs Sampling), is outside the realm of R, although it is integrated using the R library R2WinBUGS (WinBUGS being the Windows version of BUGS).

A simple WinBUGS example

Here's an example of a simple regression to demonstrate how WinBUGS works, following the example of Box 5.1 in McCarthy 2007.  I'll use the newer version of WinBUGS called OpenBUGS, available here:
http://www.openbugs.info/w/

Browse to the download page and grab the executable for your operating system. It takes a minute or two to install. Open the program and open a new file. In this file we specify a model that contains the 1) deterministic function; 2) stochastic function; and 3) prior distributions of each parameter to be estimated.  After the model we specify the data and (optionally) a list of starting values for fitted parameters.  For a simple regression with normal error, we're estimating three parameters (intercept, slope, and variance); note BUGS wants normal distributions described by a mean and precision term, which is the inverse of variance, and there are other minor differences between BUGS and R code.  Here's our first regression example:

model

{

     a ~ dnorm(0,1.0E-6) #flat prior for the intercept (extremely low precision), note "~" means "is distributed as"

     b ~ dnorm(0,1.0E-6) #flat prior for the slope (ditto)

     p ~ dgamma(0.001,0.001)  #flat gamma prior for the precision, because restricted to positive values

     for(i in 1:9) #BUGS only accepts for loops rather than vector-based functions, so we loop over each observation

     {

          mean[i] <- a + b*elev[i]     #deterministic function (linear); note equal sign in BUGS is "<-" rather than "="

          cover[i] ~ dnorm(mean[i],p)  #stochastic (data) model, here normal with precision=p

     }

}

list(elev=c(100,110,320,500,830,120,330,100,500),cover=c(0,4,5,7,8,2,2,4,4))   #data

list(a=0,b=0,p=1)  #initial values for parameters

The final line is a list with two elements, each named for the variables named in model (cover and elev).  The above can be copied and pasted in the OpenBUGS new file window.  The next step is making sure that there are no errors or omissions in the model specification.  Double-click 'model' to highlight it and then select 'Specification' under the Model menu. The button 'check model' should indicate 'model is syntactically correct' in the bottom status line, or indicate lines that need attention.  You can then use the 'load data' button to specify the dataset to BUGS, while highlighting the 'list' term in the final line of the code.  Next hit the 'compile' button to set up the information needed for the Gibbs sampler, and then tell BUGS the starting values for each parameter (like we did for optim), either by having BUGS pick values from the priors ('gen inits' button) or specifying your line using a line in the code: here highlight 'list' in the last line and then click 'load inits'.  Generally, it is recommended you specify your own starting values.

At this point WinBUGS is ready to sample from the posterior distribution. Use the ‘Update Tool’ under the Model menu to run a given number of MCMC iterations (‘updates’, with a default of 1000).  To gauge the state of our variables after a chain of 1000, select the ‘Sample Monitor Tool’ under the Inference menu and type in our parameter names (a, b, and p) in the ‘node’ box (after typing in each separately, click ‘set’ to save it).  To view what’s happening with all the parameters at once, type ‘*’ in the node box and click ‘trace’.  A graph window should open with iteration on the x axis and the fitted parameter value on the y axis for all three parameters. Taking more updates (Update Tool box) will move these trace windows in real time. As they move through the window the curves should look like white noise centered on the posterior mode: these are random samples of the posterior distributions of each parameter (to see the entire history of iterations, select ‘history’ in the Sample Monitor Tool). To see the accumulated sample for each posterior distribution, click ‘density’ in the Sample Monitor Tool window (note all graphs can be resized via click-and-drag).  The ‘stats’ button shows summary statistics (including the mid-95% of each posterior, called the Bayesian credible interval, the confidence interval analogue in Bayesian statistics).  It is typically recommended that early samples (‘burn in’ samples) are thrown away because these remain correlated with the initial state of parameter values; you can get rid of these by changing the ‘beg’ box value in the Sample Monitor Tool to 1000 or some other reasonably large value. The ‘auto cor’ button gives you an idea as to the correlation structure of the MCMC values of each parameter (here, values separated by 10 or more iterations appear to be uncorrelated). As you take many samples, the results should conform to what we established using likelihood, given we used uninformative priors.

The R interface for WinBUGS/OpenBUGS

 

R users can have all the optimization advantages of WinBUGS or OpenBUGS without having to leave the R interface, using the R2WinBUGS library (if using just WinBUGS) or R2WinBUGS with the BRugs library (if using the more recent OpenBUGS).  You must first have WinBUGS or OpenBUGS installed in the C:/Program Files/ directory (although there is an option for changing this path in R).  The next steps are actually on par or easier than the initialization steps in the BUGS programs themselves (here I'm borrowing from Andrew Gelman's webpage at Columbia: http://www.stat.columbia.edu/~gelman/bugsR/runningbugs.html):

            1) install and attach the BRugs and R2WinBUGS libraries in R

            2) write a BUGS model; this can either be in a standard text file using BUGS model syntax and saved in the current working directory (preferred) or writing an equivalent R function and translating it to BUGS syntax using the write.model function

            3) prepare inputs, including dataset (variables as a list), the function for initial values, and a character vector of which variables to monitor during chaining

            4) call the BUGS program using the bugs function, specifying necessary BUGS arguments

            5) analyze the results using print and plot functions, and accessing the posteriors themselves using attach.bugs


Example with species distribution data

            We can start by building off the BUGS code for a regression with a normal likelihood function for Fraser magnolia.  We begin as usual with a dataset of abundances and absences:

dat = read.csv(file.choose())     #grab the treedata.csv file
dat2 = subset(dat,dat$species=="Magnolia fraseri") #Fraser's magnolia
ro.plots = unique(as.character(dat2$plotID))    #plots with mag fra

u.plots = unique(as.character(dat$plotID)) #all plots
nob.plots = u.plots[is.element(u.plots,ro.plots)==F] #plots without mag fra
dat3 = subset(dat,is.element(as.character(dat$plotID),nob.plots)) #dataset of no mag fra
dat4 = subset(dat3,duplicated(dat3$plotID)==F)  #one row per plot
dat4$cover = 0     #cover of red oak is zero in these plots
mf = na.exclude(rbind(dat2,dat4))     #new dataframe of presences and absences

            Install and attach the R-BUGS interface packages (here we'll use OpenBUGS, but the WinBUGS routine is basically the same, without the need for BRugs or the program="openbugs" argument to the bugs function):

install.packages(c("BRugs","R2WinBUGS"))

library(BRugs)

library(R2WinBUGS)

And before we go any further we need to construct the BUGS model, including deterministic and likelihood (stochastic) components and the distributions of all priors.  We can start with a three-parameter model of magnolia cover with respect to an intercept, elevation coefficient, and normal-based precision (1/variance).  Save this as a text file called "model1.bug" in your R working directory:

model     
     a ~ dnorm(0,1.0E-6) #flat prior for the intercept

     b ~ dnorm(0,1.0E-6) #flat prior for the slope

     prec ~ dgamma(0.001,0.001) #flat gamma prior for the precision

     for(j in 1:J) {      #J  = number of observations, j is our looping variable

           mu[j] <- a + b*elev[j] #deterministic function (linear)

           cover[j] ~ dnorm(mu[j],prec) #stochastic (data) model, here normal with precision=prec

     }

}

Back in R, we prepare a list of our variables needed for the above model and define a function for initial values for each chain (replicate run).  Our response variable is cover, our fixed predictor variable is elev, and we also have to define J:

J = nrow(mf)       #total observations, here J=991
cover = mf$cover   #response variable
elev = mf$elev/1000 #predictor variable: again, easier in units of km rather than m
bugsdata = list ("J", "cover", "elev") #data goes into BUGS as a list

The next bit is slightly tricky: for each estimated parameter, we need to provide starting values to initialize each chain, as we do directly in OpenBUGS.  The easiest option is to use the argument inits=NULL in the bugs function, telling OpenBUGS to sample randomly from the prior distribution for starting values.  Or you could define a list of 'n.chains' elements, each being a vector of single starting values for each parameter; subsequent runs (chains) will use the starting values from each separate vector you supply in the list.  The third and most complicated, but probably most effective, option is to generate starting values randomly for each chain based on a function you supply.  You can then specify the number of chains later, and each run will use an independent starting point depending on your function.  For example:
inits = function() list(a=rnorm(1,0,10),b=rnorm(1,0,10),prec=rlnorm(1,1,1))

Note the distribution for precision is a lognormal so that only positive values are produced.  The final preparatory step is to define a character vector for the variables we want to 'save' (monitor) with the Gibbs sampler:

parameters = c("a","b","prec")

We're now ready to execute the chains in OpenBUGS using the bugs function.  There are many possible arguments to bugs; check ?bugs for the full details.  You must include the arguments for the dataset list, the initial values, the monitored parameters, and the model file; for OpenBUGS (rather than WinBUGS) you must include program="openbugs".  You can optionally change the number of chains (n.chains), the total number of iterations in each chain (n.iter), and the number of initial iterations that are thrown away as burn-in (n.burnin):

sim1 = bugs(bugsdata, inits, parameters, "model1.bug", n.chains=3, n.iter=1000, n.burnin=500, program="openbugs")
model is syntactically correct

data loaded

model compiled

Initializing chain 1: initial values loaded but this or another chain contain   uninitialized variables

Initializing chain 2: initial values loaded but this or another chain contain   uninitialized variables

Initializing chain 3: model is initialized

model is already initialized

Sampling has been started ...

500 updates took 0 s

deviance set

monitor set for variable 'a'

monitor set for variable 'b'

monitor set for variable 'prec'

monitor set for variable 'deviance'

500 updates took 1 s

The bugs function executes OpenBUGS in the background without opening up the software itself.  The sim1 object contains basic output for evaluating model fit, convergence of the optimization, and summary statistics of the posterior distributions of each monitored parameter. 
print(sim1)

Inference for Bugs model at "model1.bug", fit using OpenBUGS,

 3 chains, each with 1000 iterations (first 500 discarded)

 n.sims = 1500 iterations saved

           mean  sd   2.5%    25%    50%    75%  97.5% Rhat n.eff

a           2.0 0.2    1.6    1.9    2.0    2.1    2.3    1   260

b          -0.7 0.2   -1.0   -0.8   -0.7   -0.6   -0.4    1   230

prec        0.3 0.0    0.3    0.3    0.3    0.3    0.3    1  1500

deviance 4088.2 2.4 4085.3 4086.4 4087.6 4089.4 4094.1    1  1100

 

For each parameter, n.eff is a crude measure of effective sample size,

and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

 

DIC info (using the rule, pD = Dbar-Dhat)

pD = 3.1 and DIC = 4091.0

DIC is an estimate of expected predictive error (lower deviance is better).

The output is based on 3 chains, each with 500 saved iterations, for a total of 1500 MCMC samples saved to define the posterior distributions of each parameter (above, '1500 iterations saved').  (Note for iterations greater than 2000 bugs 'thins' the output to save memory; you can change how it does this with the n.thin argument.)  The table lists the summary statistics for each estimated parameter (N=1500), including critical quantiles (the 2.5% and 97.5% stats represent a 95% Bayesian credible interval, which states that there is a 95% chance that the real parameter value lies within this range).  The 'Rhat' value is a measure of convergence that takes into account the different starting values for the different chains; this should be 1 or close to it.  'n.eff' is a sample size statistic that takes convergence criteria into account.  Deviance (2*-LL) is automatically monitored as a parameter with its own posterior, and can be used to evaluate model goodness of fit, either via the traditional AIC (here, roughly 4088.2+2*3; 'roughly' because we'd actually need the deviance at the MLEs for each parameter, rather than the mean of its posterior) or via the Deviance Information Criterion (DIC).  DIC is very similar to AIC, except that 1) deviance is calculated slightly differently, using the mean of the posterior distribution rather than the likelihood at the MLEs; and 2) it uses the 'estimated' number of parameters (pD, above), which is usually not an integer.  The need for 'estimated' parameters for goodness of fit tests in HB models results from the conditional nature of the optimization, the existence of priors, and (often) random effects terms.  With symmetrical posteriors and flat priors in simple cases, AIC and DIC converge.  But for most HB applications DIC is a more robust way to compare models, which is why it is built into OpenBUGS.  You can access the summary table itself using sim1$summary.  A summary plot of these results can be obtained with:
plot(sim1)

Convergence statistics for each parameter are on the left, and parameter summaries are on the right.  This is often of significantly less utility than the printed table.  Fortunately you have access to the entire output, including all saved iterations of each fitted parameter, with the attach.bugs command:
attach.bugs(sim1)

The posterior distributions can now be accessed with calls to each parameter, which are separate vectors of length equal to the number of saved iterations (here, 1500).  For example, we can examine the posterior of the elevation coefficient and plot the Bayesian credible interval:
plot(density(b))

abline(v=quantile(b,probs=c(.025,.975)),col="red",lwd=2)

Consistent with our above summary table, the interval is between -0.98 and -0.39; actually the above table rounds up, and the more precise values are:
sim1$summary

                 mean         sd         2.5%          25%          50%          75%        97.5%     Rhat n.eff

a           1.9622642 0.16143985    1.6445501    1.8550253    1.9601085    2.0692011    2.2865161 1.007875   260

b          -0.6823751 0.15005087   -0.9782953   -0.7877308   -0.6857629   -0.5753468   -0.3911635 1.008950   230

prec        0.2767917 0.01299819    0.2528816    0.2675235    0.2763662    0.2854939    0.3024640 1.000860  1500

deviance 4088.1885700 2.41943098 4085.3293335 4086.3523560 4087.5738525 4089.4056396 4094.1349608 1.001863  1100

Inference from posteriors is a breeze, accommodating a full range of statistical questions that are difficult with frequentist techniques.  For example, we can calculate odds of a parameter being within certain bounds directly, such as, what is the probability that the rate of magnolia cover class decline with elevation is greater than 1 class per km?

cF = ecdf(b)  #empirical cumulative distribution function

cF(-1)         #proportion of values than are less than -1 class per km

[1] 0.01866667
Very unlikely.  What is the probability the value is between -1 and -0.5?
cF(-0.5) - cF(-1)
[1] 0.86

Extremely likely.  Just to check our function is working, what is the probability of the value between -0.978 and -0.391?

cF(-0.391) - cF(-0.978)
[1] 0.9493333
Hopefully this is 95%, because these values were our credible range limits.  The function ecdf can be generally used to make posterior-based inferences on parameter estimates.

            Before we attempt a more complicated model note that we could have written our BUGS model as an R function, and then used the write.model function to translate this to OpenBUGS mode file.  Here's the equivalent R function:

bugsmodel <- function() {

     a ~ dnorm(0,1.0E-6)

     b ~ dnorm(0,1.0E-6)

     prec ~ dgamma(0.001,0.001)

     for(j in 1:J) {

          mu[j] <- a + b*elev[j]

          cover[j] ~ dnorm(mu[j],prec)

     }
}
write.model(bugsmodel,con="model2.bug")

sim2 = bugs(bugsdata, inits, parameters, "model2.bug", n.chains=3, n.iter=1000, n.burnin=500, program="openbugs")

You should get the same results as using model1.bug.  The annoying bit is that you have to use the OpenBUGS syntax ('<-' and '~' instead of '='), so this method is basically equivalent to working in a separate text file in the R directory.

            Let's make our stochastic function more appropriate for the cover class response variable by changing our normal likelihood function to a Poisson process, using a log link function.  Save this as text file called 'model3.bug':

model

{

     a ~ dnorm(0,1.0E-6) #flat prior for the intercept

     b ~ dnorm(0,1.0E-6) #flat prior for the slope

     for(j in 1:J) {    #J  = number of observations, j is our looping variable

          log(mu[j]) <- a + b*elev[j] #deterministic function with log link

          cover[j] ~ dpois(mu[j]) #poisson likelihood

     }

}
As we've lost one parameter (precision, not used by the Poisson process where variance and mean are equal), we redefine our initial values and monitored parameters before calling OpenBUGS:
parameters = c("a","b")

inits = function() list(a=rnorm(1,0,10),b=rnorm(1,0,10))

sim3 = bugs(bugsdata, inits, parameters, "model3.bug", n.chains=3, n.iter=1000, n.burnin=500, program="openbugs")

print(sim3)

Inference for Bugs model at "model3.bug", fit using OpenBUGS,

 3 chains, each with 1000 iterations (first 500 discarded)

 n.sims = 1500 iterations saved

           mean  sd   2.5%    25%    50%    75%  97.5% Rhat n.eff

a           0.8 0.1    0.6    0.7    0.8    0.8    0.9    1   990

b          -0.6 0.1   -0.7   -0.6   -0.6   -0.5   -0.4    1   960

deviance 3793.7 1.9 3791.8 3792.3 3793.1 3794.5 3798.7    1  1500

 

For each parameter, n.eff is a crude measure of effective sample size,

and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

 

DIC info (using the rule, pD = Dbar-Dhat)

pD = 1.9 and DIC = 3796.0

DIC is an estimate of expected predictive error (lower deviance is better).

The model is significantly better (compare DICs of 3796 and 4091), and we might be interested in how this changed our estimate of the slope coefficient.  We can save separate copies of the 'b' vector from each run:
attach.bugs(sim3)
b3 = b
attach.bugs(sim1)
b1 = b

plot(density(b3),col="red",xlim=c(-1,0))
lines(density(b1),col="blue")

Our credible interval for b has shrunk considerably with a more appropriate error structure. 

 

Adding a hierarchy

            Up to this point our model is not strictly hierarchical, because we are assuming that all observations have the same error structure.  As an example of adding a random effect based on a grouping variable, we can break up the observations into different groups based on which project each plot is associated with.  The 'plotID' variable contains this information in the its first component:

table(as.factor(substr(mf$plotID,1,4)))

ATBN ATBP DELL DLAP DSLM FRID MILL PITT UFRL ULRY WHCA WISR

  76   13   32    5   69   54   25   43  348  208  105   13

The dataset is actually a conglomeration of 12 vegetation studies conducted after 1975.  Studies were conducted with different methods and in different areas of the park, so it is likely that the abundance of any particular species varies with respect to project type in some unknown fashion.  To create a numeric grouping variable called 'project' we add to the mf dataset:
project = factor(substr(mf$plotID,1,4))

mf$project = as.numeric(project)

We now have to decide how such a grouping variable will contribute to the variance in our model construction.  We expect 'project' to add random variation to our expected cover value, with the difference among projects determined by a random draw from a normal distribution of mean=0 and unknown variance.  This added variance should appear in the regression that determines the expected value for the Poisson likelihood.  For example (“model4.bug”):
model

{

     a ~ dnorm(0,1.0E-6) #flat prior for the intercept

     b ~ dnorm(0,1.0E-6) #flat prior for the slope

     for(j in 1:12) {   #loop over each project (N=12)

          d[j] ~ dnorm(0,prec1)   #projects differ according to a random draw from a normal distribution with precision=prec1

     }

     prec1 ~ dgamma(0.001,0.001)  #flat prior for among-project precision

     for(j in 1:J) {    #J  = number of observations, j is our looping variable

          log(mu[j]) <- a + b*elev[j] + d[project[j]] #add project-level variance

          cover[j] ~ dpois(mu[j]) #poisson likelihood #no change in likelihood function

     }

}

project = mf$project

bugsdata = list ("J", "cover", "elev", "project")    #add project to our data list

parameters = c("a","b","prec1")   #add prec1 as a monitored parameter

inits = function() list(a=rnorm(1,0,10),b=rnorm(1,0,10),prec1=rlnorm(1,1,1))

sim4 = bugs(bugsdata, inits, parameters, "model4.bug", n.chains=3, n.iter=1000, n.burnin=500, program="openbugs")

Inference for Bugs model at "model4.bug", fit using OpenBUGS,

 3 chains, each with 1000 iterations (first 500 discarded)

 n.sims = 1500 iterations saved

           mean  sd   2.5%    25%    50%    75%  97.5% Rhat n.eff

a           8.8 6.2   -0.1    0.4   12.0   14.1   14.6 50.2     3

b          -0.2 0.1   -0.4   -0.3   -0.2   -0.2    0.0  1.1    47

prec1       0.4 0.6    0.0    0.0    0.0    0.7    1.8  9.8     3

deviance 3543.3 5.6 3534.7 3539.2 3542.7 3546.6 3556.0  1.2    15

 

For each parameter, n.eff is a crude measure of effective sample size,

and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

 

DIC info (using the rule, pD = Dbar-Dhat)

pD = 13.5 and DIC = 3557.0

DIC is an estimate of expected predictive error (lower deviance is better).

Our deviance (and DIC) has decreased markedly but we have convergence issues (Rhat) with a and prec1 parameters.  I increase the number of iterations to 50,000 (WARNING: this took about ~20  minutes on my machine):
sim4 = bugs(bugsdata, inits, parameters, "model4.bug", n.chains=3, n.iter=50000, n.burnin=500, program="openbugs")

print(sim4)

Inference for Bugs model at "model4.bug", fit using OpenBUGS,

 3 chains, each with 50000 iterations (first 500 discarded), n.thin = 148

 n.sims = 1002 iterations saved

           mean  sd   2.5%    25%    50%    75%  97.5% Rhat n.eff

a           0.7 2.8   -0.9   -0.2    0.0    0.3   12.1  1.4    16

b          -0.2 0.1   -0.4   -0.3   -0.2   -0.2   -0.1  1.0   270

prec1       0.8 0.5    0.0    0.5    0.7    1.1    2.0  1.3    17

deviance 3544.8 5.5 3535.9 3540.9 3543.8 3548.2 3556.8  1.0   210

 

For each parameter, n.eff is a crude measure of effective sample size,

and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

 

DIC info (using the rule, pD = Dbar-Dhat)

pD = 13.3 and DIC = 3558.0

DIC is an estimate of expected predictive error (lower deviance is better).

The convergence is much improved (although not perfect), and the DIC is about the same but with better parameter estimates.  We can again ask about the elevation coefficient posterior:
attach.bugs(sim4)

b4 = b

plot(density(b4),col="red",xlim=c(-1,0))
lines(density(b3),col="blue")

The elevation coefficient is very sensitive to our addition of a random effect based on project. 

 

Hierarchical logistic regression
            Let’s continue model building for Fraser magnolia by demonstrating a model with a binary (presence-absence) response variable (‘pres’) and a submodel for imperfect species detection.  We’ll add several new predictor variables to the mix, and define their coefficients with a vector that makes specifying priors more convenient.  For the sake of optimization speed we’ll ignore the project-based random effect, but the above lines for the random effect could be inserted here as well.  Instead we’ll suppose that our response variable (presence) results from both our process model (the regression based on environmental variables) and an additional submodel that considers detection probability (observation error) as a function of plot size (the bigger the quadrat, the more likely we are to detect the species).

In regression cases like this, McCarthy (2007, Bayesian Methods for Ecology) suggests the (MCMC) optimization works more efficiently if the predictor variables are used after subtracting their means; if the variable is untransformed this has no effect on the coefficient value (except for the intercept, which can be back-transformed if necessary for predictions).  We’ll do this in R before feeding the data to OpenBUGS.

pres = as.numeric(mf$cover>0)     #binary response

elev = mf$elev-mean(mf$elev) #substract mean for elev predictor values

tci = mf$tci-mean(mf$tci)    #ditto tci

strdst = log(mf$streamdist+1)-mean(log(mf$streamdist+1)) #same for streamdist, but log-transform first
beers = mf$beers-mean(mf$beers)   #ditto radiation
plotsize = log(mf$plotsize)-mean(log(mf$plotsize))   #ditto plot size  

J = nrow(mf)       #total observations, here J=991  

bugsdata = list ("J","pres","elev","tci","strdst","beers","plotsize")     #data as list

We’ve already constructed BUGS models that involve assigning a probability to a cover value based on predictor variables (e.g., the Poisson model above).  Adding detection probability to the mix assumes we have independent information on the probability of observing a species given it occurs in a particular location.  Here we can use variation in plot size (which we assume varies independently of the predictor variables) in our model of detection probability and the environmental variables as our model of occurrence probability.  We record an occurrence as the independent outcome of these two probabilities (detection x occurrence).  This also serves to illustrate how we can break down what may seem to be a complex model into easily specified parts, and have the conditional parameter optimization via MCMC handle the difficult task of keeping track of the conditional relationships.  This is “model5.bug”:

model

{

     for(j in 1:J) {    #J  = number of observations

          true.occ[j] ~ dbern(prob[j]) #prob of true occupancy (latent variable)

          logit(prob[j]) <- beta[1] + beta[2]*elev[j] + beta[3]*(elev[j]*elev[j]) + beta[4]*tci[j] + beta[5]*strdst[j] + beta[6]*beers[j] #’real’ occurrence model with a logit link

          logit(detect[j]) <- d1 + d2*plotsize[j]    #prob of detection model based on plot size with logit link

          p1[j] <- true.occ[j]*detect[j]

          pres[j] ~ dbern(p1[j]) #observed presence is a binomial (Bernoulli, N=1) probability of P(occurrence) * P(detection)

     }

     #prior distributions

     d1 ~ dnorm(0,1.0E-6)    #flat prior for intercept of detection model

     d2 ~ dnorm(0,1.0E-6)    #flat prior for slope of detection model

     for(j in 1:6) {

     beta[j] ~ dnorm(0,1.0E-6)#flat priors for each regression coefficient

     }

     #derived (and monitored) variable of number of true occurrences

     actual.occ <- sum(true.occ[])

}

Back in R we define our monitored parameters and our initialization conditions before executing the OpenBUGS code. Note the initial values for the ‘beta’ coefficients can be specified as a vector instead of single elements.

parameters = c("beta","d1","d2","actual.occ")   #monitored parameters

inits = function() list(beta=rnorm(6,0,10),d1=rnorm(1,0,10),d2=rlnorm(1,1,1))

sim5 = bugs(bugsdata, inits, parameters, "model5.bug", n.chains=1, n.iter=100,   n.burnin=50, program="openbugs")

The model also illustrates how we can establish inference on a ‘latent’ (unobserved) variable that is nonetheless of interest, here being the ‘real’ occurrences (those we observed, plus those we didn’t observe because of imperfect detection: “actual.occ”). 

 

A final HB example

 

First I create a set of predictor variables, including elevation, TCI, plot size (10, 100, or 1000 m2), and project number (there were three different research projects in the area providing data for this study).  A total of 200 plots were sampled.  The sample function is a way of sampling uniformly from an integer vector.

#simulated predictor variables

N = 200       #number of plots

elev = sample(200:2000,N,replace=T)/1000        #elevation in km

tci = runif(N,min=1,max=25)                     #tci

size = sample(c(10,100,1000),N,replace=T)       #plot sizes

project = sample(c(1,2,3),N,replace=T,prob=c(.2,.2,.6)) #vegetation project

Next I use this set of predictors to simulate a species occurrence dataset, based on 1) an environmental deterministic function (elevation and tci); 2) detection probability, which increases with plot size; and 3) a random offset for project, which describes some unknown way that species occurrence varies by project. 

#simulated response variable

       #deterministic env process parameters

       b1 = .5     #overall intercept

       b2 = 1      #elev coefficient

       b3 = -.01   #elev^2 coefficient

       b4 = -.3    #tci coefficient

 

       #detection process parameters

       d1 = 0      #detection intercept

       d2 = .7     #plot size coefficient

       tau = 1     #precision (1/variance) of random effect

 

       #actual occurrence is a function of env variables and a random effect for project

       re = rnorm(3,0,1/tau) #random effect varies by project

       lp = b1 + b2*elev + b3*(elev^2) + b4*tci + re[project]

       env.occ = 1 / (1+exp(-lp))     #logit transformation of occurrence probability

 

       #real occurrence includes bernoulli process error

       real.occ = rbinom(N,size=1,prob=env.occ)

 

       #probability of detection is a function of log plot size

       dp = d1 + d2*log(size)

       detect.prob = 1 / (1+exp(-dp)) #logit transformation of detection probability

 

       #observation of occurrence depends on actual occurrence and detection

       obs.occ = rbinom(N,size=1,prob=detect.prob*real.occ)

 

#input data frame

df = data.frame(elev,tci,size,obs.occ,project)

 

Now we have a dataset of known properties.  We fit an HB model via BUGS in R:

library(BRugs)

library(R2WinBUGS)

 

J = nrow(df)       #total observations

M = 3              #number of groups (projects)

obs = df$obs.occ   #response variable

elev = df$elev     #predictor variable

tci = df$tci       #predictor variable

size = log(df$size) #predictor variable

project = df$project    #projects

bugsdata = list("J","M","obs","elev","tci","size","project")

parameters = c("beta1","beta2","beta3","beta4","delta1","delta2","tau")

#initialization of parameters via random functions

inits = function() list(beta1=rnorm(1,.5,10),beta2=rnorm(1,1,10),

     beta3=rnorm(1,-.01,10),beta4=rnorm(1,3,10),

     delta1=rnorm(1,0,10),delta2=rnorm(1,.7,10),tau=rlnorm(1,1,1))

We can specify the BUGS model in R and write it to a text file for BUGS input:

sink("bugsmod1.txt")

cat("model {

 

for (i in 1:J) {   #loop over J observations

 

     #occurrence model, including random effect u

    logit(p[i]) <- beta1+beta2*elev[i]+beta3*elev[i]*elev[i]+beta4*tci[i]+u[project[i]]

 

     #detection model

     logit(d[i]) <- delta1 + delta2*size[i]

 

     #observation depends on independent probabilities of detection and occurrence

     op[i] <- p[i]*d[i]

     obs[i] ~ dbern(op[i])

     }

    

     # random effects for M groups

     for (j in 1:M) {

          u[j] ~ dnorm(0,tau) #tau is precision (1/variance) between projects

     }

 

     # priors

     delta1 ~ dnorm(0.0,1.0E-6)

     delta2 ~ dnorm(0.0,1.0E-6)

     beta1 ~ dnorm(0.0,1.0E-6)

     beta2 ~ dnorm(0.0,1.0E-6)

     beta3 ~ dnorm(0.0,1.0E-6)

     beta4 ~ dnorm(0.0,1.0E-6)

     # 'hyperprior'

     tau ~ dgamma(0.001,0.001)

}

",fill=TRUE)

sink()

 

We're now ready to evaluate the model in OpenBUGS and examine the result. You can of course tweak the chaining parameters below.

#send to OpenBUGS

sim1 = bugs(bugsdata, inits, parameters,"bugsmod1.txt",n.chains=3,

     n.iter=1000,n.burnin=500,program="openbugs")

print(sim1)

attach.bugs(sim1)

sim1$summary

There is some reason for concern that the above model is not entirely correct, because it involves only one Bernoulli process (which has a probability equal to the product of occurrence and detection probabilities) rather than two: one concerning the occurrence model (whether the species is actually there or not), and one concerning the conditional probability of detection given the species is present (which has a probability equal to the detection probability when a species is present, zero otherwise).  Consider the following BUGS model:

sink("bugsmod2.txt")

cat("model {

 

for (i in 1:J) {   #loop over J observations

 

     #model of occurrences as a Bernoulli trial

    logit(p[i]) <- beta1+beta2*elev[i]+beta3*elev[i]*elev[i]+beta4*tci[i]+u[project[i]]

    actual[i] ~ dbern(p[i])  #actual presences

   

     #detection model, including logit transformation

    d[i] <- delta1 + delta2*size[i]   #process model for detectability

    dd[i] <- exp(d[i])/(1+exp(d[i]))  #another form of the logit formula

   

     #observation depends on detectability and actual occurrence

    detect[i] <- dd[i]*actual[i] #detectability for presences, zero otherwise

     obs[i] ~ dbern(detect[i]) #actual detections

 

     # random effects for M groups

     for (j in 1:M) {

          u[j] ~ dnorm(0,tau) #tau is precision (1/variance) between projects

     }

 

     # priors

     delta1 ~ dnorm(0.0,1.0E-6)

     delta2 ~ dnorm(0.0,1.0E-6)

     beta1 ~ dnorm(0.0,1.0E-6)

     beta2 ~ dnorm(0.0,1.0E-6)

     beta3 ~ dnorm(0.0,1.0E-6)

     beta4 ~ dnorm(0.0,1.0E-6)

     # 'hyperprior'

     tau ~ dgamma(0.001,0.001)

}

",fill=TRUE)

sink()

The only difference in bugsmod2.txt is that there is an extra Bernoulli trial.  The problem is that BUGS won't fit this model, or at least wouldn't fit it for me regardless of my simulated data parameterization or initial values.