May 3: A final hierarchical Bayesian example

 

First I create a set of predictor variables, including elevation, TCI, plot size (10, 100, or 1000 m2), and project number (there were three different research projects in the area providing data for this study).  A total of 200 plots were sampled.  The sample function is a way of sampling uniformly from an integer vector.

#simulated predictor variables

N = 200       #number of plots

elev = sample(200:2000,N,replace=T)/1000        #elevation in km

tci = runif(N,min=1,max=25)                     #tci

size = sample(c(10,100,1000),N,replace=T)       #plot sizes

project = sample(c(1,2,3),N,replace=T,prob=c(.2,.2,.6)) #vegetation project

Next I use this set of predictors to simulate a species occurrence dataset, based on 1) an environmental deterministic function (elevation and tci); 2) detection probability, which increases with plot size; and 3) a random offset for project, which describes some unknown way that species occurrence varies by project. 

#simulated response variable

       #deterministic env process parameters

       b1 = .5     #overall intercept

       b2 = 1      #elev coefficient

       b3 = -.01   #elev^2 coefficient

       b4 = -.3    #tci coefficient

 

       #detection process parameters

       d1 = 0      #detection intercept

       d2 = .7     #plot size coefficient

       tau = 1     #precision (1/variance) of random effect

 

       #actual occurrence is a function of env variables and a random effect for project

       re = rnorm(3,0,1/tau) #random effect varies by project

       lp = b1 + b2*elev + b3*(elev^2) + b4*tci + re[project]

       env.occ = 1 / (1+exp(-lp))     #logit transformation of occurrence probability

 

       #real occurrence includes bernoulli process error

       real.occ = rbinom(N,size=1,prob=env.occ)

 

       #probability of detection is a function of log plot size

       dp = d1 + d2*log(size)

       detect.prob = 1 / (1+exp(-dp)) #logit transformation of detection probability

 

       #observation of occurrence depends on actual occurrence and detection

       obs.occ = rbinom(N,size=1,prob=detect.prob*real.occ)

 

#input data frame

df = data.frame(elev,tci,size,obs.occ,project)

 

Now we have a dataset of known properties.  We fit an HB model via BUGS in R:

library(BRugs)

library(R2WinBUGS)

 

J = nrow(df)       #total observations

M = 3              #number of groups (projects)

obs = df$obs.occ   #response variable

elev = df$elev     #predictor variable

tci = df$tci       #predictor variable

size = log(df$size) #predictor variable

project = df$project    #projects

bugsdata = list("J","M","obs","elev","tci","size","project")

parameters = c("beta1","beta2","beta3","beta4","delta1","delta2","tau")

#initialization of parameters via random functions

inits = function() list(beta1=rnorm(1,.5,10),beta2=rnorm(1,1,10),

     beta3=rnorm(1,-.01,10),beta4=rnorm(1,3,10),

     delta1=rnorm(1,0,10),delta2=rnorm(1,.7,10),tau=rlnorm(1,1,1))

We can specify the BUGS model in R and write it to a text file for BUGS input:

sink("bugsmod1.txt")

cat("model {

 

for (i in 1:J) {   #loop over J observations

 

     #occurrence model, including random effect u

    logit(p[i]) <- beta1+beta2*elev[i]+beta3*elev[i]*elev[i]+beta4*tci[i]+u[project[i]]

 

     #detection model

     logit(d[i]) <- delta1 + delta2*size[i]

 

     #observation depends on independent probabilities of detection and occurrence

     op[i] <- p[i]*d[i]

     obs[i] ~ dbern(op[i])

     }

    

     # random effects for M groups

     for (j in 1:M) {

          u[j] ~ dnorm(0,tau) #tau is precision (1/variance) between projects

     }

 

     # priors

     delta1 ~ dnorm(0.0,1.0E-6)

     delta2 ~ dnorm(0.0,1.0E-6)

     beta1 ~ dnorm(0.0,1.0E-6)

     beta2 ~ dnorm(0.0,1.0E-6)

     beta3 ~ dnorm(0.0,1.0E-6)

     beta4 ~ dnorm(0.0,1.0E-6)

     # 'hyperprior'

     tau ~ dgamma(0.001,0.001)

}

",fill=TRUE)

sink()

 

We're now ready to evaluate the model in OpenBUGS and examine the result. You can of course tweak the chaining parameters below.

#send to OpenBUGS

sim1 = bugs(bugsdata, inits, parameters,"bugsmod1.txt",n.chains=3,

     n.iter=1000,n.burnin=500,program="openbugs")

print(sim1)

attach.bugs(sim1)

sim1$summary

There is some reason for concern that the above model is not entirely correct, because it involves only one Bernoulli process (which has a probability equal to the product of occurrence and detection probabilities) rather than two: one concerning the occurrence model (whether the species is actually there or not), and one concerning the conditional probability of detection given the species is present (which has a probability equal to the detection probability when a species is present, zero otherwise).  Consider the following BUGS model:

sink("bugsmod2.txt")

cat("model {

 

for (i in 1:J) {   #loop over J observations

 

     #model of occurrences as a Bernoulli trial

    logit(p[i]) <- beta1+beta2*elev[i]+beta3*elev[i]*elev[i]+beta4*tci[i]+u[project[i]]

    actual[i] ~ dbern(p[i])  #actual presences

   

     #detection model, including logit transformation

    d[i] <- delta1 + delta2*size[i]   #process model for detectability

    dd[i] <- exp(d[i])/(1+exp(d[i]))  #another form of the logit formula

   

     #observation depends on detectability and actual occurrence

    detect[i] <- dd[i]*actual[i] #detectability for presences, zero otherwise

     obs[i] ~ dbern(detect[i]) #actual detections

 

     # random effects for M groups

     for (j in 1:M) {

          u[j] ~ dnorm(0,tau) #tau is precision (1/variance) between projects

     }

 

     # priors

     delta1 ~ dnorm(0.0,1.0E-6)

     delta2 ~ dnorm(0.0,1.0E-6)

     beta1 ~ dnorm(0.0,1.0E-6)

     beta2 ~ dnorm(0.0,1.0E-6)

     beta3 ~ dnorm(0.0,1.0E-6)

     beta4 ~ dnorm(0.0,1.0E-6)

     # 'hyperprior'

     tau ~ dgamma(0.001,0.001)

}

",fill=TRUE)

sink()

The only difference in bugsmod2.txt is that there is an extra Bernoulli trial.  The problem is that BUGS won't fit this model, or at least wouldn't fit it for me regardless of my simulated data parameterization or initial values.