**Bayesian methods in R and BUGS **

*Toward Bayesian models*

So much has been written on Bayesian vs. non-Bayesian analysis
in ecology in the last decade that I will limit our discussion to
implementation of Bayesian models in R and leave the 'why or why not' concerns
to more able statisticians (highly recommended: McCarthy's 2007 book *Bayesian
Methods for Ecology*, also Clark 2005 Ecology Letters 8:2-14). Essentially,
what a simple Bayesian approach (as opposed to hierarchical Bayes, which we'll
approach later) facilitates is the translation of a conditional statement about
data (given a hypothesis) to a conditional statement about a hypothesis (given a
dataset), using Bayes' theorem. Rev. Thomas Bayes figured this out in the 18^{th}
century: he was interested in calculating inverse probabilities (e.g., given
data of heads and tails, what can you tell about the coin?). In practice, we
do this by multiplying our already established likelihood function [P(data|model)]
by a so-called *prior* distribution, which is the suspected probability of
the model before we collect new data. This product of a likelihood and a prior
is proportional to the *posterior* distribution [P(model|data)], which
becomes the basis for statistical inference. Once you've established the
posterior distribution, model selection and parameter estimation are relatively
easy and based on the more intuitive notion of calculating actual probabilities
of different *hypotheses*. The difficulties of Bayesian modeling are how
you get to the posterior: you need to specify a prior (sometimes admitting that
you know nothing about the model initially: a so-called *uninformative *or*
flat prior*) and then you need to estimate the shape of the posterior
distribution. The analytical problem in this is that all the posterior probabilities
must sum to 1 (the prior and posterior distributions are probability density
functions), and thus the product of the likelihood function and the prior
distribution must be standardized by a sum of probabilities of all possible
outcomes. In most cases we can't figure this out analytically and so must
estimate the shape of the posterior using a numerical random sampling technique
(Markov Chain Monte Carlo simulation, or MCMC). In many ways this is similar to
the optimization routines we needed to calculate MLEs: like the simulated
annealing algorithm, MCMC works sequentially to find new parameter values using
random jumps through parameter space. Of the several common MCMC variants used
to 'solve' (actually sample from) the posterior distribution, perhaps the most
popular is called *Gibbs sampling*. This algorithm (often called the
Gibbs sampler) is particularly good for estimating the value of one parameter
conditional on values of a host of other parameters, which is especially
effective for fitting hierarchical Bayesian models. Unfortunately, the most
effective software that uses the Gibbs sampler, BUGS (Bayesian inference Using
Gibbs Sampling), is outside the realm of R, although it is integrated using the
R library R2WinBUGS (WinBUGS being the Windows version of BUGS).

*A simple WinBUGS example*

Here's an example of a simple regression to demonstrate how
WinBUGS works, following the example of Box 5.1 in McCarthy 2007. I'll use the
newer version of WinBUGS called OpenBUGS, available here:

http://www.openbugs.info/w/

Browse to the download page and grab the executable for your
operating system. It takes a minute or two to install. Open the program and
open a new file. In this file we specify a *model* that contains the 1)
deterministic function; 2) stochastic function; and 3) prior distributions of
each parameter to be estimated. After the model we specify the data and
(optionally) a list of starting values for fitted parameters. For a simple
regression with normal error, we're estimating three parameters (intercept,
slope, and variance); note BUGS wants normal distributions described by a mean
and precision term, which is the inverse of variance, and there are other minor
differences between BUGS and R code. Here's our first regression example:

model

{

a ~ dnorm(0,1.0E-6) #flat prior for the intercept (extremely low precision), note "~" means "is distributed as"

b ~ dnorm(0,1.0E-6) #flat prior for the slope (ditto)

p ~ dgamma(0.001,0.001) #flat gamma prior for the precision, because restricted to positive values

for(i in 1:9) #BUGS only accepts for loops rather than vector-based functions, so we loop over each observation

{

mean[i] <- a + b*elev[i] #deterministic function (linear); note equal sign in BUGS is "<-" rather than "="

cover[i] ~ dnorm(mean[i],p) #stochastic (data) model, here normal with precision=p

}

}

list(elev=c(100,110,320,500,830,120,330,100,500),cover=c(0,4,5,7,8,2,2,4,4)) #data

list(a=0,b=0,p=1) #initial values for parameters

The final line is a list with two elements, each named for
the variables named in **model** (cover and elev). The above can be copied
and pasted in the OpenBUGS new file window. The next step is making sure that
there are no errors or omissions in the model specification. Double-click
'model' to highlight it and then select 'Specification' under the Model menu.
The button 'check model' should indicate 'model is syntactically correct' in
the bottom status line, or indicate lines that need attention. You can then
use the 'load data' button to specify the dataset to BUGS, while highlighting
the 'list' term in the final line of the code. Next hit the 'compile' button
to set up the information needed for the Gibbs sampler, and then tell BUGS the
starting values for each parameter (like we did for **optim**), either by
having BUGS pick values from the priors ('gen inits' button) or specifying your
line using a line in the code: here highlight 'list' in the last line and then
click 'load inits'. Generally, it is recommended you specify your own starting
values.

At this point WinBUGS is ready to sample from the posterior
distribution. Use the ‘Update Tool’ under the Model menu to run a given number
of MCMC iterations (‘updates’, with a default of 1000). To gauge the state of
our variables after a chain of 1000, select the ‘Sample Monitor Tool’ under the
Inference menu and type in our parameter names (a, b, and p) in the ‘node’ box
(after typing in each separately, click ‘set’ to save it). To view what’s
happening with all the parameters at once, type ‘*’ in the node box and click
‘trace’. A graph window should open with iteration on the x axis and the
fitted parameter value on the y axis for all three parameters. Taking more
updates (Update Tool box) will move these trace windows in real time. As they
move through the window the curves should look like white noise centered on the
posterior mode: these are random samples of the posterior distributions of each
parameter (to see the entire history of iterations, select ‘history’ in the
Sample Monitor Tool). To see the accumulated sample for each posterior
distribution, click ‘density’ in the Sample Monitor Tool window (note all
graphs can be resized via click-and-drag). The ‘stats’ button shows summary
statistics (including the mid-95% of each posterior, called the *Bayesian*
*credible interval*, the confidence interval analogue in Bayesian
statistics). It is typically recommended that early samples (‘burn in’
samples) are thrown away because these remain correlated with the initial state
of parameter values; you can get rid of these by changing the ‘beg’ box value
in the Sample Monitor Tool to 1000 or some other reasonably large value. The
‘auto cor’ button gives you an idea as to the correlation structure of the MCMC
values of each parameter (here, values separated by 10 or more iterations
appear to be uncorrelated). As you take many samples, the results should
conform to what we established using likelihood, given we used uninformative
priors.

*The R interface for
WinBUGS/OpenBUGS*

* *

R users can have all the optimization advantages of WinBUGS or OpenBUGS without having to leave the R interface, using the R2WinBUGS library (if using just WinBUGS) or R2WinBUGS with the BRugs library (if using the more recent OpenBUGS). You must first have WinBUGS or OpenBUGS installed in the C:/Program Files/ directory (although there is an option for changing this path in R). The next steps are actually on par or easier than the initialization steps in the BUGS programs themselves (here I'm borrowing from Andrew Gelman's webpage at Columbia: http://www.stat.columbia.edu/~gelman/bugsR/runningbugs.html):

1) install and attach the BRugs and R2WinBUGS libraries in R

2) write a
BUGS model; this can either be in a standard text file using BUGS model syntax
and saved in the current working directory (preferred) or writing an equivalent
R function and translating it to BUGS syntax using the **write.model** function

3) prepare inputs, including dataset (variables as a list), the function for initial values, and a character vector of which variables to monitor during chaining

4) call
the BUGS program using the **bugs** function, specifying necessary BUGS
arguments

5) analyze
the results using print and plot functions, and accessing the posteriors
themselves using **attach.bugs**

Example with species distribution data

We can start by building off the BUGS code for a regression with a normal likelihood function for Fraser magnolia. We begin as usual with a dataset of abundances and absences:

dat =
read.csv(file.choose()) #grab the treedata.csv file

dat2 = subset(dat,dat$species=="Magnolia
fraseri") #Fraser's magnolia

ro.plots =
unique(as.character(dat2$plotID)) #plots with mag fra

u.plots = unique(as.character(dat$plotID)) #all plots

nob.plots = u.plots[is.element(u.plots,ro.plots)==F] #plots without
mag fra

dat3 = subset(dat,is.element(as.character(dat$plotID),nob.plots))
#dataset of no mag fra

dat4 =
subset(dat3,duplicated(dat3$plotID)==F) #one row per plot

dat4$cover = 0 #cover
of red oak is zero in these plots

mf =
na.exclude(rbind(dat2,dat4)) #new dataframe of
presences and absences

Install and attach the R-BUGS interface
packages (here we'll use OpenBUGS, but the WinBUGS routine is basically the
same, without the need for BRugs or the *program="openbugs"* argument to the **bugs** function):

install.packages(c("BRugs","R2WinBUGS"))

library(BRugs)

library(R2WinBUGS)

And before we go any further we need to construct the BUGS model, including deterministic and likelihood (stochastic) components and the distributions of all priors. We can start with a three-parameter model of magnolia cover with respect to an intercept, elevation coefficient, and normal-based precision (1/variance). Save this as a text file called "model1.bug" in your R working directory:

modela ~ dnorm(0,1.0E-6) #flat prior for the intercept

b ~ dnorm(0,1.0E-6) #flat prior for the slope

prec ~ dgamma(0.001,0.001) #flat gamma prior for the precision

for(j in 1:J) { #J = number of observations, j is our looping variable

mu[j] <- a + b*elev[j] #deterministic function (linear)

cover[j] ~ dnorm(mu[j],prec) #stochastic (data) model, here normal with precision=prec

}

}

Back in R, we prepare a list of our variables needed for the above model and define a function for initial values for each chain (replicate run). Our response variable is cover, our fixed predictor variable is elev, and we also have to define J:

J = nrow(mf) #total
observations, here J=991

cover = mf$cover #response variable

elev = mf$elev/1000 #predictor variable: again,
easier in units of km rather than m

bugsdata = list ("J", "cover",
"elev") #data goes into BUGS as a list

The next bit is slightly tricky: for each estimated
parameter, we need to provide starting values to initialize each chain, as we
do directly in OpenBUGS. The easiest option is to use the argument inits=NULL
in the bugs function, telling OpenBUGS to sample randomly from the prior
distribution for starting values. Or you could define a list of 'n.chains'
elements, each being a vector of single starting values for each parameter;
subsequent runs (chains) will use the starting values from each separate vector
you supply in the list. The third and most complicated, but probably most
effective, option is to generate starting values randomly for each chain based
on a function you supply. You can then specify the number of chains later, and
each run will use an independent starting point depending on your function.
For example:

inits = function() list(a=rnorm(1,0,10),b=rnorm(1,0,10),prec=rlnorm(1,1,1))

Note the distribution for precision is a lognormal so that only positive values are produced. The final preparatory step is to define a character vector for the variables we want to 'save' (monitor) with the Gibbs sampler:

parameters = c("a","b","prec")

We're now ready to execute the chains in OpenBUGS using
the **bugs** function. There are many possible arguments to **bugs**;
check ?bugs for the full details. You must include the arguments for the
dataset list, the initial values, the monitored parameters, and the model file;
for OpenBUGS (rather than WinBUGS) you must include
program="openbugs". You can optionally change the number of chains
(n.chains), the total number of iterations in each chain (n.iter), and the
number of initial iterations that are thrown away as burn-in (n.burnin):

sim1 =
bugs(bugsdata, inits, parameters, "model1.bug", n.chains=3,
n.iter=1000, n.burnin=500, program="openbugs")

model is syntactically correct

data loaded

model compiled

Initializing chain 1: initial values loaded but this or another chain contain uninitialized variables

Initializing chain 2: initial values loaded but this or another chain contain uninitialized variables

Initializing chain 3: model is initialized

model is already initialized

Sampling has been started ...

500 updates took 0 s

deviance set

monitor set for variable 'a'

monitor set for variable 'b'

monitor set for variable 'prec'

monitor set for variable 'deviance'

500 updates took 1 s

The bugs function executes OpenBUGS in the background
without opening up the software itself. The sim1 object contains basic output
for evaluating model fit, convergence of the optimization, and summary
statistics of the posterior distributions of each monitored parameter.

print(sim1)

Inference for Bugs model at "model1.bug", fit using OpenBUGS,

3 chains, each with 1000 iterations (first 500 discarded)

n.sims = 1500 iterations saved

mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff

a 2.0 0.2 1.6 1.9 2.0 2.1 2.3 1 260

b -0.7 0.2 -1.0 -0.8 -0.7 -0.6 -0.4 1 230

prec 0.3 0.0 0.3 0.3 0.3 0.3 0.3 1 1500

deviance 4088.2 2.4 4085.3 4086.4 4087.6 4089.4 4094.1 1 1100

For each parameter, n.eff is a crude measure of effective sample size,

and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = Dbar-Dhat)

pD = 3.1 and DIC = 4091.0

DIC is an estimate of expected predictive error (lower deviance is better).

The output is based on
3 chains, each with 500 saved iterations, for a total of 1500 MCMC samples
saved to define the posterior distributions of each parameter (above, '1500
iterations saved'). (Note for iterations greater than 2000 bugs 'thins' the
output to save memory; you can change how it does this with the *n.thin* argument.) The table lists the summary statistics for each estimated parameter
(N=1500), including critical quantiles (the 2.5% and 97.5% stats represent a
95% Bayesian credible interval, which states that there is a 95% chance that
the real parameter value lies within this range). The 'Rhat' value is a
measure of convergence that takes into account the different starting values
for the different chains; this should be 1 or close to it. 'n.eff' is a sample
size statistic that takes convergence criteria into account. Deviance (2*-LL)
is automatically monitored as a parameter with its own posterior, and can be
used to evaluate model goodness of fit, either via the traditional AIC (here,
roughly 4088.2+2*3; 'roughly' because we'd actually need the deviance at the
MLEs for each parameter, rather than the mean of its posterior) or via the
Deviance Information Criterion (DIC). DIC is very similar to AIC, except that
1) deviance is calculated slightly differently, using the mean of the posterior
distribution rather than the likelihood at the MLEs; and 2) it uses the
'estimated' number of parameters (pD, above), which is usually not an integer.
The need for 'estimated' parameters for goodness of fit tests in HB models
results from the conditional nature of the optimization, the existence of
priors, and (often) random effects terms. With symmetrical posteriors and flat
priors in simple cases, AIC and DIC converge. But for most HB applications DIC
is a more robust way to compare models, which is why it is built into OpenBUGS.
You can access the summary table itself using sim1$summary. A summary plot of
these results can be obtained with:

plot(sim1)

Convergence statistics
for each parameter are on the left, and parameter summaries are on the right. This
is often of significantly less utility than the printed table. Fortunately you
have access to the entire output, including all saved iterations of each fitted
parameter, with the **attach.bugs** command:

attach.bugs(sim1)

The posterior
distributions can now be accessed with calls to each parameter, which are
separate vectors of length equal to the number of saved iterations (here,
1500). For example, we can examine the posterior of the elevation coefficient
and plot the Bayesian credible interval:

plot(density(b))

abline(v=quantile(b,probs=c(.025,.975)),col="red",lwd=2)

Consistent with our above summary table, the interval is
between -0.98 and -0.39; actually the above table rounds up, and the more
precise values are:

sim1$summary

mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff

a 1.9622642 0.16143985 1.6445501 1.8550253 1.9601085 2.0692011 2.2865161 1.007875 260

b -0.6823751 0.15005087 -0.9782953 -0.7877308 -0.6857629 -0.5753468 -0.3911635 1.008950 230

prec 0.2767917 0.01299819 0.2528816 0.2675235 0.2763662 0.2854939 0.3024640 1.000860 1500

deviance 4088.1885700 2.41943098 4085.3293335 4086.3523560 4087.5738525 4089.4056396 4094.1349608 1.001863 1100

Inference from posteriors is a breeze, accommodating a full range of statistical questions that are difficult with frequentist techniques. For example, we can calculate odds of a parameter being within certain bounds directly, such as, what is the probability that the rate of magnolia cover class decline with elevation is greater than 1 class per km?

cF = ecdf(b) #empirical cumulative distribution function

cF(-1) #proportion of values than are less than -1 class per km

[1] 0.01866667

Very unlikely. What is the probability the value is between -1 and
-0.5?

cF(-0.5) - cF(-1)

[1] 0.86

Extremely likely. Just to check our function is working, what is the probability of the value between -0.978 and -0.391?

cF(-0.391) - cF(-0.978)

[1] 0.9493333

Hopefully this is 95%, because these values were our credible range
limits. The function **ecdf** can be generally used to make posterior-based
inferences on parameter estimates.

Before we
attempt a more complicated model note that we could have written our BUGS model
as an R function, and then used the **write.model** function to translate
this to OpenBUGS mode file. Here's the equivalent R function:

bugsmodel <- function() {

a ~ dnorm(0,1.0E-6)

b ~ dnorm(0,1.0E-6)

prec ~ dgamma(0.001,0.001)

for(j in 1:J) {

mu[j] <- a + b*elev[j]

cover[j] ~ dnorm(mu[j],prec)

}

}

write.model(bugsmodel,con="model2.bug")

sim2 = bugs(bugsdata, inits, parameters, "model2.bug", n.chains=3, n.iter=1000, n.burnin=500, program="openbugs")

You should get the same results as using model1.bug. The annoying bit is that you have to use the OpenBUGS syntax ('<-' and '~' instead of '='), so this method is basically equivalent to working in a separate text file in the R directory.

Let's make our stochastic function more appropriate for the cover class response variable by changing our normal likelihood function to a Poisson process, using a log link function. Save this as text file called 'model3.bug':

model

{

a ~ dnorm(0,1.0E-6) #flat prior for the intercept

b ~ dnorm(0,1.0E-6) #flat prior for the slope

for(j in 1:J) { #J = number of observations, j is our looping variable

log(mu[j]) <- a + b*elev[j] #deterministic function with log link

cover[j] ~ dpois(mu[j]) #poisson likelihood

}

}

As we've lost one parameter (precision, not used by the Poisson process
where variance and mean are equal), we redefine our initial values and
monitored parameters before calling OpenBUGS:

parameters
= c("a","b")

inits = function() list(a=rnorm(1,0,10),b=rnorm(1,0,10))

sim3 = bugs(bugsdata, inits, parameters, "model3.bug", n.chains=3, n.iter=1000, n.burnin=500, program="openbugs")

print(sim3)

Inference for Bugs model at "model3.bug", fit using OpenBUGS,

3 chains, each with 1000 iterations (first 500 discarded)

n.sims = 1500 iterations saved

mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff

a 0.8 0.1 0.6 0.7 0.8 0.8 0.9 1 990

b -0.6 0.1 -0.7 -0.6 -0.6 -0.5 -0.4 1 960

deviance 3793.7 1.9 3791.8 3792.3 3793.1 3794.5 3798.7 1 1500

For each parameter, n.eff is a crude measure of effective sample size,

and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = Dbar-Dhat)

pD = 1.9 and DIC = 3796.0

DIC is an estimate of expected predictive error (lower deviance is better).

The model is significantly better (compare DICs of 3796 and
4091), and we might be interested in how this changed our estimate of the slope
coefficient. We can save separate copies of the 'b' vector from each run:

attach.bugs(sim3)

b3 = b

attach.bugs(sim1)

b1 = b

plot(density(b3),col="red",xlim=c(-1,0))

lines(density(b1),col="blue")

Our credible interval for b has shrunk considerably with a more appropriate error structure.

*Adding a hierarchy*

Up to this point our model is not strictly hierarchical, because we are assuming that all observations have the same error structure. As an example of adding a random effect based on a grouping variable, we can break up the observations into different groups based on which project each plot is associated with. The 'plotID' variable contains this information in the its first component:

table(as.factor(substr(mf$plotID,1,4)))

ATBN ATBP DELL DLAP DSLM FRID MILL PITT UFRL ULRY WHCA WISR

76 13 32 5 69 54 25 43 348 208 105 13

The dataset is actually a conglomeration of 12 vegetation
studies conducted after 1975. Studies were conducted with different methods
and in different areas of the park, so it is likely that the abundance of any
particular species varies with respect to project type in some unknown
fashion. To create a numeric grouping variable called 'project' we add to the
mf dataset:

project = factor(substr(mf$plotID,1,4))

mf$project = as.numeric(project)

We now have to decide how such
a grouping variable will contribute to the variance in our model construction.
We expect 'project' to add random variation to our expected cover value, with
the difference among projects determined by a random draw from a normal
distribution of mean=0 and unknown variance. This added variance should appear
in the regression that determines the expected value for the Poisson
likelihood. For example (“model4.bug”):

model

{

a ~ dnorm(0,1.0E-6) #flat prior for the intercept

b ~ dnorm(0,1.0E-6) #flat prior for the slope

for(j in 1:12) { #loop over each project (N=12)

d[j] ~ dnorm(0,prec1) #projects differ according to a random draw from a normal distribution with precision=prec1

}

prec1 ~ dgamma(0.001,0.001) #flat prior for among-project precision

for(j in 1:J) { #J = number of observations, j is our looping variable

log(mu[j]) <- a + b*elev[j] + d[project[j]] #add project-level variance

cover[j] ~ dpois(mu[j]) #poisson likelihood #no change in likelihood function

}

}

project = mf$project

bugsdata = list ("J", "cover", "elev", "project") #add project to our data list

parameters = c("a","b","prec1") #add prec1 as a monitored parameter

inits = function() list(a=rnorm(1,0,10),b=rnorm(1,0,10),prec1=rlnorm(1,1,1))

sim4 = bugs(bugsdata, inits, parameters, "model4.bug", n.chains=3, n.iter=1000, n.burnin=500, program="openbugs")

Inference for Bugs model at "model4.bug", fit using OpenBUGS,

3 chains, each with 1000 iterations (first 500 discarded)

n.sims = 1500 iterations saved

mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff

a 8.8 6.2 -0.1 0.4 12.0 14.1 14.6 50.2 3

b -0.2 0.1 -0.4 -0.3 -0.2 -0.2 0.0 1.1 47

prec1 0.4 0.6 0.0 0.0 0.0 0.7 1.8 9.8 3

deviance 3543.3 5.6 3534.7 3539.2 3542.7 3546.6 3556.0 1.2 15

For each parameter, n.eff is a crude measure of effective sample size,

and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = Dbar-Dhat)

pD = 13.5 and DIC = 3557.0

DIC is an estimate of expected predictive error (lower deviance is better).

Our deviance (and DIC) has decreased markedly but we have
convergence issues (Rhat) with a and prec1 parameters. I increase the number
of iterations to 50,000 (**WARNING**: this took about ~20 minutes on my machine):

sim4 = bugs(bugsdata, inits, parameters, "model4.bug",
n.chains=3, n.iter=50000, n.burnin=500, program="openbugs")

print(sim4)

Inference for Bugs model at "model4.bug", fit using OpenBUGS,

3 chains, each with 50000 iterations (first 500 discarded), n.thin = 148

n.sims = 1002 iterations saved

mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff

a 0.7 2.8 -0.9 -0.2 0.0 0.3 12.1 1.4 16

b -0.2 0.1 -0.4 -0.3 -0.2 -0.2 -0.1 1.0 270

prec1 0.8 0.5 0.0 0.5 0.7 1.1 2.0 1.3 17

deviance 3544.8 5.5 3535.9 3540.9 3543.8 3548.2 3556.8 1.0 210

For each parameter, n.eff is a crude measure of effective sample size,

and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = Dbar-Dhat)

pD = 13.3 and DIC = 3558.0

DIC is an estimate of expected predictive error (lower deviance is better).

The convergence is much
improved (although not perfect), and the DIC is about the same but with better
parameter estimates. We can again ask about the elevation coefficient
posterior:

attach.bugs(sim4)

b4 = b

plot(density(b4),col="red",xlim=c(-1,0))

lines(density(b3),col="blue")

The elevation coefficient is very sensitive to our addition of a random effect based on project.

*Hierarchical logistic
regression*

Let’s continue model building for Fraser magnolia by demonstrating
a model with a binary (presence-absence) response variable (‘pres’) and a
submodel for imperfect species detection. We’ll add several new predictor
variables to the mix, and define their coefficients with a vector that makes
specifying priors more convenient. For the sake of optimization speed we’ll
ignore the project-based random effect, but the above lines for the random
effect could be inserted here as well. Instead we’ll suppose that our response
variable (presence) results from both our process model (the regression based
on environmental variables) and an additional submodel that considers detection
probability (observation error) as a function of plot size (the bigger the
quadrat, the more likely we are to detect the species).

In regression cases like this,
McCarthy (2007, *Bayesian Methods for Ecology*) suggests the (MCMC)
optimization works more efficiently if the predictor variables are used after
subtracting their means; if the variable is untransformed this has no effect on
the coefficient value (except for the intercept, which can be back-transformed
if necessary for predictions). We’ll do this in R before feeding the data to
OpenBUGS.

pres = as.numeric(mf$cover>0) #binary response

elev = mf$elev-mean(mf$elev) #substract mean for elev predictor values

tci = mf$tci-mean(mf$tci) #ditto tci

strdst
= log(mf$streamdist+1)-mean(log(mf$streamdist+1)) #same for streamdist, but
log-transform first

beers = mf$beers-mean(mf$beers) #ditto radiation

plotsize = log(mf$plotsize)-mean(log(mf$plotsize)) #ditto plot size

J = nrow(mf) #total observations, here J=991

bugsdata = list ("J","pres","elev","tci","strdst","beers","plotsize") #data as list

We’ve already constructed BUGS models that involve
assigning a probability to a cover value based on predictor variables (e.g.,
the Poisson model above). Adding detection probability to the mix assumes we
have independent information on the probability of observing a species *given
it occurs *in a particular location. Here we can use variation in plot size
(which we assume varies independently of the predictor variables) in our model
of detection probability and the environmental variables as our model of
occurrence probability. We record an occurrence as the independent outcome of
these two probabilities (detection x occurrence). This also serves to
illustrate how we can break down what may seem to be a complex model into
easily specified parts, and have the conditional parameter optimization via
MCMC handle the difficult task of keeping track of the conditional
relationships. This is “model5.bug”:

model

{

for(j in 1:J) { #J = number of observations

true.occ[j] ~ dbern(prob[j]) #prob of true occupancy (latent variable)

logit(prob[j]) <- beta[1] + beta[2]*elev[j] + beta[3]*(elev[j]*elev[j]) + beta[4]*tci[j] + beta[5]*strdst[j] + beta[6]*beers[j] #’real’ occurrence model with a logit link

logit(detect[j]) <- d1 + d2*plotsize[j] #prob of detection model based on plot size with logit link

p1[j] <- true.occ[j]*detect[j]

pres[j] ~ dbern(p1[j]) #observed presence is a binomial (Bernoulli, N=1) probability of P(occurrence) * P(detection)

}

#prior distributions

d1 ~ dnorm(0,1.0E-6) #flat prior for intercept of detection model

d2 ~ dnorm(0,1.0E-6) #flat prior for slope of detection model

for(j in 1:6) {

beta[j] ~ dnorm(0,1.0E-6)#flat priors for each regression coefficient

}

#derived (and monitored) variable of number of true occurrences

actual.occ <- sum(true.occ[])

}

Back in R we define our monitored parameters and our initialization conditions before executing the OpenBUGS code. Note the initial values for the ‘beta’ coefficients can be specified as a vector instead of single elements.

parameters = c("beta","d1","d2","actual.occ") #monitored parameters

inits = function() list(beta=rnorm(6,0,10),d1=rnorm(1,0,10),d2=rlnorm(1,1,1))

sim5 = bugs(bugsdata, inits, parameters, "model5.bug", n.chains=1, n.iter=100, n.burnin=50, program="openbugs")

The model also illustrates how we can establish inference on a ‘latent’ (unobserved) variable that is nonetheless of interest, here being the ‘real’ occurrences (those we observed, plus those we didn’t observe because of imperfect detection: “actual.occ”).

*A final HB example *

First I create a set of predictor variables, including
elevation, TCI, plot size (10, 100, or 1000 m^{2}), and project number
(there were three different research projects in the area providing data for
this study). A total of 200 plots were
sampled. The **sample** function is a way of sampling uniformly from an integer
vector.

#simulated
predictor variables

N = 200 #number of plots

elev = sample(200:2000,N,replace=T)/1000 #elevation in km

tci = runif(N,min=1,max=25) #tci

size = sample(c(10,100,1000),N,replace=T) #plot sizes

project =
sample(c(1,2,3),N,replace=T,prob=c(.2,.2,.6))
#vegetation project

Next I use this set of predictors to simulate a species occurrence dataset, based on 1) an environmental deterministic function (elevation and tci); 2) detection probability, which increases with plot size; and 3) a random offset for project, which describes some unknown way that species occurrence varies by project.

#simulated
response variable

#deterministic env process parameters

b1 = .5 #overall
intercept

b2 = 1 #elev coefficient

b3 = -.01 #elev^2 coefficient

b4 = -.3 #tci coefficient

#detection process parameters

d1 = 0 #detection
intercept

d2 = .7 #plot
size coefficient

tau = 1 #precision
(1/variance) of random effect

#actual occurrence is a function of env variables and a random effect for project

re = rnorm(3,0,1/tau) #random
effect varies by project

lp = b1 + b2*elev + b3*(elev^2) + b4*tci + re[project]

env.occ = 1 /
(1+exp(-lp)) #logit transformation of occurrence probability

#real occurrence includes bernoulli process error

real.occ = rbinom(N,size=1,prob=env.occ)

#probability of detection is a function of log
plot size

dp = d1 + d2*log(size)

detect.prob =
1 / (1+exp(-dp)) #logit transformation
of detection probability

#observation of occurrence depends on
actual occurrence and detection

obs.occ = rbinom(N,size=1,prob=detect.prob*real.occ)

#input
data frame

df = data.frame(elev,tci,size,obs.occ,project)

Now we have a dataset of known properties. We fit an HB model via BUGS in R:

library(BRugs)

library(R2WinBUGS)

J = nrow(df) #total observations

M = 3 #number
of groups (projects)

obs = df$obs.occ #response
variable

elev = df$elev #predictor
variable

tci = df$tci #predictor
variable

size = log(df$size) #predictor variable

project = df$project #projects

bugsdata = list("J","M","obs","elev","tci","size","project")

parameters
=
c("beta1","beta2","beta3","beta4","delta1","delta2","tau")

#initialization
of parameters via random functions

inits = function() list(beta1=rnorm(1,.5,10),beta2=rnorm(1,1,10),

beta3=rnorm(1,-.01,10),beta4=rnorm(1,3,10),

delta1=rnorm(1,0,10),delta2=rnorm(1,.7,10),tau=rlnorm(1,1,1))

We can specify the BUGS model in R and write it to a text file for BUGS input:

sink("bugsmod1.txt")

cat("model {

for (i in 1:J) { #loop over J observations

#occurrence model, including random effect u

logit(p[i]) <- beta1+beta2*elev[i]+beta3*elev[i]*elev[i]+beta4*tci[i]+u[project[i]]

#detection model

logit(d[i]) <- delta1 + delta2*size[i]

#observation depends on independent
probabilities of detection and occurrence

op[i] <- p[i]*d[i]

obs[i] ~ dbern(op[i])

}

# random effects
for M groups

for (j in 1:M) {

u[j] ~ dnorm(0,tau) #tau is
precision (1/variance) between projects

}

# priors

delta1 ~
dnorm(0.0,1.0E-6)

delta2 ~
dnorm(0.0,1.0E-6)

beta1 ~
dnorm(0.0,1.0E-6)

beta2 ~
dnorm(0.0,1.0E-6)

beta3 ~
dnorm(0.0,1.0E-6)

beta4 ~
dnorm(0.0,1.0E-6)

# 'hyperprior'

tau ~
dgamma(0.001,0.001)

}

",fill=TRUE)

sink()

We're now ready to evaluate the model in OpenBUGS and examine the result. You can of course tweak the chaining parameters below.

#send to OpenBUGS

sim1 = bugs(bugsdata, inits,
parameters,"bugsmod1.txt",n.chains=3,

n.iter=1000,n.burnin=500,program="openbugs")

print(sim1)

attach.bugs(sim1)

sim1$summary

There is some reason for concern that the above model is not entirely correct, because it involves only one Bernoulli process (which has a probability equal to the product of occurrence and detection probabilities) rather than two: one concerning the occurrence model (whether the species is actually there or not), and one concerning the conditional probability of detection given the species is present (which has a probability equal to the detection probability when a species is present, zero otherwise). Consider the following BUGS model:

sink("bugsmod2.txt")

cat("model {

for (i in 1:J) { #loop over J observations

#model of occurrences as a Bernoulli trial

logit(p[i]) <- beta1+beta2*elev[i]+beta3*elev[i]*elev[i]+beta4*tci[i]+u[project[i]]

actual[i] ~ dbern(p[i]) #actual presences

#detection model, including logit transformation

d[i] <- delta1 + delta2*size[i] #process model
for detectability

dd[i] <- exp(d[i])/(1+exp(d[i])) #another form of
the logit formula

#observation depends on detectability and actual occurrence

detect[i] <- dd[i]*actual[i] #detectability for
presences, zero otherwise

obs[i] ~ dbern(detect[i]) #actual detections

# random effects
for M groups

for (j in 1:M) {

u[j] ~ dnorm(0,tau) #tau is
precision (1/variance) between projects

}

# priors

delta1 ~
dnorm(0.0,1.0E-6)

delta2 ~
dnorm(0.0,1.0E-6)

beta1 ~
dnorm(0.0,1.0E-6)

beta2 ~
dnorm(0.0,1.0E-6)

beta3 ~
dnorm(0.0,1.0E-6)

beta4 ~
dnorm(0.0,1.0E-6)

# 'hyperprior'

tau ~ dgamma(0.001,0.001)

}

",fill=TRUE)

sink()

The only difference in bugsmod2.txt is that there is an extra Bernoulli trial. The problem is that BUGS won't fit this model, or at least wouldn't fit it for me regardless of my simulated data parameterization or initial values.