**March 22: Intro to hierarchical models**

The model approaches we have considered up to this point
involve estimating how the expected value (mean response) changes with respect
to some deterministic function (*process
model*), with all of the variation around the mean produced by the same
error function (*data model*). We considered model *variance* only in the sense that we wanted an appropriate stochastic
function for a given response variable (e.g., a Poisson process for count
data)—that is, assuming all the variance came from one source (usually from the
ecological process). In reality,
observations in any ecological dataset come from several potential sources,
including observation error (for both response and predictor variables) and different
error structures in the process model.

Consider
the simple example of a model of tree sizes (dbh). If we only want to describe the mean
expectation without regard to explanatory variables, we have a one-line model
of dbh ~ normal(μ,σ^{2}), and we estimate two parameters
(mean, variance). This model assumes
that our trees come from a single population defined by a constant
variance. But what if our observations
come from several different tree populations?
Even if we had no reason to suspect the mean size would be different
among populations, it seems logical to suspect that the populations might be
different, and accounting for such differences might improve the fit of the
model. One clever way of doing this is
to think about the mean (intercept) as not a 'fixed effect'—a parameter whose
value does not vary regardless of the observation—but as a variable whose value
varies depending on which population it is sampled from ('random effect'). For example, we might suppose the intercept is
sampled from a normal distribution of a given mean and variance, which is
allowed to vary for observations sampled from different populations. So our full model would be a composite
statement of a deterministic model (here, the null model of only the intercept),
a stochastic model (describing the lowest level variance, here within
populations), and a *parameter model*
(describing the expected mean and variance of the overall intercept, which
varies between populations). Our
original model of dbh ~ normal(μ,σ^{2}) had two fixed
effects, and our revised model has three fixed effects and one random effect:

dbh
~ normal(μ,σ^{2})

μ
~ normal(μ_{p},σ_{p}^{2})

Because the first μ is itself modeled stochastically,
it has become a random effect, and you can see that our model is now a
hierarchy: at least one model parameter is itself the response variable of
another model. The addition of this
'parameter model' completes the overall menu of model building: process models
describe ecological processes, stochastic models describe how error enters the
relationships (and observations), and parameter models link deterministic and
stochastic functions to different parts of an overall hierarchy of data and
process models. (Of course, the
parameters in parameter models can themselves be modeled, producing
'hyperparameters' and an abundance of model levels). In classical statistics models that include
parameter models (which include new sources of variance) are called 'mixed
effects' models because some predictor variables are fixed (unmodeled) while
others aren't (random); mixed effect models are equivalent to the more recent
jargon of multi-level, multistratum, nested, and hierarchical models. The unifying principle is that all such
models include some modeling of the variance, rather than assuming a single
(constant) variance parameter. Also,
these terms only apply to model construction, as evaluation can be via any
statistical method (least squares, ML, Bayesian). Models where observations are not independent
(including temporal and spatial autocorrelation) are similar qualitatively
because they include variance-covariance structures (more on this next time). The mixed model above is the simplest variant
of a class of models called *block*
models that allow the intercept to vary randomly according to some grouping
factor.

To
illustrate construction of hierarchical models in R, we’ll use a dataset that
involves clustered spatial sampling (blocks) and multiple observations (through
time) per ‘individual’. Such a structure
typifies many species distribution datasets, but here we’ll use data from a
temperature logger network in the Smokies (Tf2.RData is the dataset on the
class website; bring it into R with the **load**
command). These data were used in a
project to create daily landscape models for ground-level temperatures; here
I’ve pared down the full dataset to save us some computation by taking only
daily temperature data for July 2006.
The dataset describes minimum and maximum daily temperatures from 101
data loggers located 1m above the ground, arranged in 12 linear
‘cover-to-ridge’ transects at different overall elevations and aspects. Existence of the 12 transects means we can
construct a ‘block’ model, which may include a random effect for Transect. We also have variation of temperatures within
each logger itself (31 days in July), so we can also include an ‘individual’
submodel, and test for a random effect of ‘logger’. So we might envision this
model building exercise as one of three separate sources of variation: the
day-to-day variation of temperature within each logger; the average variation
between loggers of the same transect (up to 10 per group); and broad-scale
spatial variation of temperature between the 12 Transects.

The are
two standard packages for creating hierarchical models in R based on maximum
likelihood via optimization: **nlme** is
older and is better documented, while **lme4**
replaces most of the nlme functionality and adds the critical family argument
to include the exponential family of error structures (Poisson, binomial, etc.). The approaches are similar but there are some
differences in syntax. It is worthwhile
to learn both, and we’ll start with **nlme**.

load(file.choose()) #grab the Tf2.RData dataset

install.packages(c("nlme","lme4"))

library(nlme)

The
function for linear mixed models in the nlme library is **lme**, and it works like lm or glm except you need to specify fixed
and random components separately. We’ll
start with a null model that includes only an intercept term, but one that is
allowed normal variation within both Transect and logger. We’ll concentrate on
minimum daily temperature (minT):

lme1 =
lme(fixed=minT~1,random=~1|Transect/logger,method="ML",na.action=na.exclude,data=Tf2)

summary(lme1)

Linear mixed-effects model fit by
maximum likelihood

Data: Tf2

AIC
BIC logLik

21533.11 21559.77 -10762.55

Random effects:

Formula: ~1 | Transect

(Intercept)

StdDev: 1.911162

Formula: ~1 | logger %in% Transect

(Intercept) Residual

StdDev: 0.5425797 1.513821

Fixed effects: minT ~ 1

Value Std.Error DF
t-value p-value

(Intercept) 16.66494 0.5549976
5696 30.02705 0

Standardized Within-Group
Residuals:

Min Q1 Med Q3 Max

-4.1771081 -0.5028408 0.1852666
0.6500676 3.3828615

Number of Observations: 5797

Number of Groups:

Transect logger %in% Transect

12 101

Most critical here is to understand how the random effects
are specified: here we have “random=~1|Transect/logger”, where the “|” sign
signifies that what comes after is the structure of the groups, with latter
terms nested within earlier terms (here loggers are nested within Transects).
You don’t repeat the response variable in the random effects formula. The “1” indicates that we’re only interested
in modeling the intercept with respect to normal distributions for the groups
(Transect and logger). Also note that there are two methods for fitting these
models: “ML” (full [regular] maximum likelihood) is required if we’re to
compare models with different fixed effects (using AIC or LRT). “REML” is
restricted or residual ML, and actually works by fitting the model to the least
squares residuals after including the fixed effects terms, and so only involves
ML fitting of model variance components.
This is advantageous when you’re most interested in models comparing
different variance structures (random effects), and in fact you can’t compare
models with different fixed effects using REML.
Again note that normal error is assumed in **lme**.

At this
point we should examine the qualitative nature of the data within our imposed
hierarchical framework. A basic question is how much variance exists at each
level: does temperature vary more day-to-day, within transects, or between
them? The random effects output above
includes estimates of the standard deviation for each level: 1.191 between
transects, 0.542 between loggers within transects, and 1.514 within loggers. We can put this in terms of percent variance
at each level:

sigmas = c(1.911,.542,1.514)

sigmas^2 / sum(sigmas^2)

[1] 0.58544256 0.04709356
0.36746389

You can also extract the variance of each level (variance
components) using the VarCorr function:

VarCorr(lme1)

Variance StdDev

Transect = pdLogChol(1)

(Intercept) 3.6525412 1.9111623

logger = pdLogChol(1)

(Intercept) 0.2943928 0.5425797

Residual 2.2916555
1.5138215

The variance components indicate that day-to-day variation
in July is important (37% of the variance) but that spatial variance is higher,
and most of this is between transects (59%). The next question is how to
specify our fixed effects terms. This
dataset includes many predictors, but let’s focus on two that are likely
candidates for explaining within vs. between transect variance, elevation
(ELEV) and stream distance (here log-transformed, LOG.STRDST). To make graph inspection easier we can create
simplified datasets that specify the grouping factor of interest, called a
‘groupedData’ object. For example:

gD = groupedData(minT~1|Transect,data=Tf2) #transect-level variation only (means of each
logger)

plot(gD)

gD = groupedData(minT~1|logger,data=Tf2) #logger-level
variation only (means of each transect)

plot(gD)

gD = groupedData(minT~ELEV|Transect,data=Tf2) #transect-level
variation modeled with elevation fixed effect

plot(gD)

gD =
groupedData(minT~LOG.STRDST|Transect,data=Tf2) #transect-level
variation modeled with stream distance fixed effect

plot(gD)

There is clearly more variation in STRDST within transects,
and more variation in ELEV between transects (this is, of course no
accident—this is how logger deployment was structuring from the
beginning). Based with this information
we update our previous model with each new fixed effect, starting with ELEV:

lme2 =
update(lme1,fixed=minT~ELEV)

lme3 = update(lme1,fixed=minT~LOG.STRDST)

lme4 =
update(lme1,fixed=minT~ELEV+LOG.STRDST)

Now let’s examine what happened to the variance of each
level. I write a quick function to extract the variance components from VarCorr
and summarize each as a percentage:

VC = function(model) {

vars
= as.numeric(VarCorr(model)[,1][c(2,4,5)])

round(vars/sum(vars),2)

}

out = data.frame(c("between transects",
"between loggers", "within loggers"),VC(lme1),VC(lme2),VC(lme3),VC(lme4))

names(out) = c("level",
"null","elevation","log.streamdist","elev+log.strdist")

out

level null elevation
log.streamdist elev+log.strdist

1 between transects 0.59 0.04 0.60 0.03

2
between loggers 0.05
0.14 0.04 0.07

3 within loggers 0.37 0.82 0.36 0.91

The output shows that when we added elevation to the model,
we explained a vast proportion of between-transect level variation. Conversely,
stream distance was mostly helpful at the between-logger level (the absolute
variance went down about 25% at the between logger level). Adding both
variables left essentially all of the variance at the within-logger level. We
can of course use the tools of ML to investigate whether the decreases in
variance were significant:

anova(lme1,lme2,lme3,lme4)

Model df AIC
BIC logLik Test
L.Ratio p-value

lme1 1 4
21533.11 21559.77 -10762.55

lme2 2 5
21522.50 21555.82 -10756.25 1 vs 2 12.61125
4e-04

lme3 3 5
21515.21 21548.53 -10752.60

lme4 4 6
21450.89 21490.88 -10719.44 3 vs 4 66.32371
<.0001

The default output for
**anova** in nlme guesses which nested
contrasts you want; here we see lme4 is substantially better than the other
models and we can confirm with p values (LRT) using:

anova(lme2,lme4)

Model df AIC
BIC logLik Test L.Ratio p-value

lme2 1 5
21522.50 21555.82 -10756.25

lme4 2 6
21450.89 21490.88 -10719.44 1 vs 2 73.6117
<.0001

anova(lme3,lme4)

Model df AIC
BIC logLik Test
L.Ratio p-value

lme3 1 5
21515.21 21548.53 -10752.60

lme4 2 6
21450.89 21490.88 -10719.44 1 vs 2 66.32371
<.0001

which tests the two
fixed-effect term both against each single-term model. What can we do for day-to-day variance? The variable minSYN is derived from regional
weather stations corrected for lapse rate:

lme5 =
update(lme4,fixed=minT~ELEV+LOG.STRDST+minSYN)

VC(lme5)

[1] 0.07 0.22 0.72

anova(lme4,lme5)

Model df AIC
BIC logLik Test
L.Ratio p-value

lme4 1 6
21450.89 21490.88 -10719.44

lme5 2 7
14290.76 14337.42 -7138.38 1 vs 2
7162.125 <.0001

We can continue to exploit the power of hierarchical
modeling by asking whether the effects of our predictor variables (elevation,
stream distance, regional temperature) should themselves vary by transect or
logger. Our current model only varies our overall intercept with respect to
transect and logger. Of course, now that
we have fixed effects associated with each level, do we actually need these
random effects?

lme6 =
update(lme5,random=~1|Transect) #remove
the random effect for logger

lme7 = update(lme5,random=~1|logger) #remove
the random effect for transect

anova(lme5,lme6)

Model df AIC
BIC logLik Test
L.Ratio p-value

lme5 1 7
14290.76 14337.42 -7138.380

lme6 2 6
15393.36 15433.35 -7690.679 1 vs 2 1104.598
<.0001

anova(lme5,lme7)

Model df AIC
BIC logLik Test
L.Ratio p-value

lme5 1 7
14290.76 14337.42 -7138.380

lme7 2 6
14297.56 14337.55 -7142.781 1 vs 2 8.800888
0.003

Indeed there are still factors that produce variation
between both transects and loggers that our fixed effects haven’t fully
accounted for. We already saw in our groupedData graph for LOG.STRDST that,
although the effect was generally positive (temperatures increased with
distance-from-stream), the slope wasn’t perfectly consistent between
transects. We could allow the slope
itself (the coefficient form LOG.STRDST we already estimated as a fixed effect)
to vary randomly (via normal error) with Transect, like this (note I had to
change LOG.STRDST slightly by adding a constant +1 to it, because other the
model fit wouldn’t otherwise converge when the term included zeros):

lme8 =
update(lme5,random=list(~1|logger,~I(LOG.STRDST+1)|Transect))

anova(lme5,lme8)

Model df AIC
BIC logLik Test
L.Ratio p-value

lme5 1 7 14290.76 14337.42 -7138.380

lme8 2 9 14303.56 14363.55 -7142.781 1 vs 2
8.800893 0.0123

Here we broke up the random effects term into two separate formulas for each
nested level. Interestingly, the model
suggests that the variation in the slope value for LOG.STRDST is not enough
between transects to merit a random effect. What about regional temperature—is
this effect the same regardless of where the transect is located?

lme9 = update(lme5,random=list(~1|logger,~minSYN|Transect))

anova(lme5,lme9)

Model df AIC
BIC logLik Test L.Ratio p-value

lme5 1 7
14290.76 14337.42 -7138.380

lme9 2 9
14291.46 14351.44 -7136.729 1 vs 2 3.30166
0.1919

Again, the random effect is not enough to merit inclusion
the model. This is intellectually
satisfying, since it means the fixed effect predictors work the same regardless
of where we are on the landscape. Of course, we still have some work to do: in
particular we may be worried about the non-independence of logger values
through time, or their spatial correlation (although note the random effects
for spatial structure take care of some of this problem). We’ll examine this next time.

It is
worthwhile to demonstrate this model structure using the more recent **lmer** function in the lme4 library:

library(lme4)

lmer5 = lmer(minT~ELEV+LOG.STRDST+minSYN+(1|Transect)+(1|logger),REML=F,data=Tf2,na.action=na.exclude,family=gaussian)

summary(lmer5)

Linear mixed model fit by maximum
likelihood

Formula: minT ~ ELEV + LOG.STRDST
+ minSYN + (1 | Transect) + (1 | logger)

Data: Tf2

AIC BIC logLik deviance REMLdev

14291 14337
-7138 14277 14308

Random effects:

Groups
Name Variance Std.Dev.

logger (Intercept) 0.196541 0.44333

Transect (Intercept) 0.062834 0.25067

Residual 0.651855 0.80738

Number of obs: 5797, groups:
logger, 101; Transect, 12

Fixed effects:

Estimate Std. Error t value

(Intercept) 3.5941512
0.2799954 12.84

ELEV -0.0012356 0.0002009
-6.15

LOG.STRDST 0.3169141
0.0298608 10.61

minSYN 0.8361858 0.0069850
119.71

Correlation of Fixed Effects:

(Intr) ELEV LOG.ST

ELEV -0.726

LOG.STRDST -0.253 -0.288

minSYN -0.498
0.138 0.001

The important differences are how to specify the random
effects components (in parentheses), “method” here is whether REML=T or not,
and we can specify the glm family of error distributions. The output is
slightly different (with variance components more easily read in a table) but
the results look identical to lme5. Like
lme, comparing models of different fixed effects requires REML=F.

Finally,
we’d be remiss not to try a GAM mixed model example, which is fit with the **gamm** function in Wood’s familiar mgcv
library. The syntax for random effects is again slightly different but the rest
is built on **lme** (although the gamm4
package apparently does the same with lmer):

gamm5 = gamm(minT~s(ELEV)+s(LOG.STRDST)+s(minSYN),random=list(Transect=~1,logger=~1),data=Tf2,na.action=na.exclude)

gamm5

Although there does not appear to be anova or AIC functionality in this
package, the reported log L is -7072.89, making the deviance=14146 on 13.73
estimated DF. The lme5 deviance is 14277
using up 14 parameters, so (unsurprisingly) the gamm appears to be a more
efficient model. There appears to be AIC methods for the gamm4 package.

*Exercise*

Compose the likelihood function that corresponds to model lme5, including both
fixed and random effects, and use optim to find the parameter estimates. Do
they correspond to what we found with both lme and lmer?

*Zero-truncated and zero-inflated models*

A
convenient class of hierarchical models for species distribution modeling are
those that can deal with count-like response variables that would otherwise be
associated with a Poisson data model, but have either 1) no zeros (a dataset
that does not include absences but presences are positive integers), or 2) an
overabundance of zeros (many more than expected from a Poisson process). The former situation can be modeled with *zero-truncated* models that adjust either
Poisson or negative binomial functions such that zeros are not possible, and
are available in the **VGAM**
library. The latter situation can be
dealt with using* zero-inflated* models
that either model absences separately from the positive integers (so-called
'two-part' or *zero-altered* models),
or zero-inflated mixture models that also use two sub-models but absences can
enter both (one accounts for 'false' zeros, where the species cannot occur, and
the other accounts for 'real' zeros, where the species can indeed occur but, by
chance, doesn't in your dataset).
Zero-inflated model functions are available in the **pscl** library. A very
readable introduction to these models is Chapter 11 of Zuur et al. (2009), *Mixed effects models and extensions in
ecology with R*, which I've used as the basis for the following examples.

Our
species cover data that don't include absences may be a decent candidate for a **zero-truncated model**. We can return to our Fraser magnolia example:

dat =
read.csv(file.choose()) #grab the treedata.csv
file

dat2 = subset(dat,dat$species=="Magnolia
fraseri") #Fraser's magnolia

install.packages("VGAM")

library(VGAM) #lookout: VGAM overwrites
some key functions from the stats package

The function implementing the
zero-truncated model is **vglm**. The
family argument has many options; here we are most interested whether the zero
truncation approach fits a better model than a simple Poisson fit. The
zero-truncated Poisson model is *family=pospoisson*,
and the (regular) poisson is *family=poissonff*. I construct a simple deterministic model
involving elevation and stream distance:

vglm1 =
vglm(cover~elev+I(elev^2)+streamdist+elev*streamdist,family=pospoisson,data=dat2)

vglm2 = vglm(cover~elev+I(elev^2)+streamdist+elev*streamdist,family=poissonff,data=dat2)

AIC(vglm1)

[1] 1490.624

AIC(vglm2)

[1] 1532.216

Unfortunately the **anova**
function is not yet implemented in the VGAM library, but AICs suggest we have a
vast improvement in model building with the zero-truncated model. Also note that
Poisson counts assume equal mean and variance.
If the variance increases faster than the mean, a negative binomial
model is more appropriate, because it includes an additional parameter for the
added variance (overdispersion). Zero-truncated
models have an extension to the negative binomial case using the *family=posnegbinomial* argument. Unfortunately I couldn't get this routine to
work in the present example, but it makes sense to compare Poisson and NB
versions of model building when constructing zero-truncated models. There is also a GAM version using **vgam**:

vgam1
=
vgam(cover~elev+I(elev^2)+streamdist+elev*streamdist,family=pospoisson,data=dat2)

although I couldn't get this to do much of anything beyond the glm
version. Still, they merit exploration.

**Zero-inflated models** come in two
flavors, each with a Poisson or negative binomial variant: ZAP and ZANB on the
one hand (zero-altered Poisson, zero-altered NB) and ZIP and ZINB on the other
(zero-inflated *thinking about the nature of an absence*:
is the site completely outside of a species' range or niche (a 'naughty naught'),
and thus should be eliminated on the basis of a simple binomial function, or is
the site acceptable but, simply due to Poisson (or NB) chance, the species was
absent? ZIP and ZINB models allow
absences to enter each submodel (one binomial, the other Poisson or NB). Because both submodels can have complicated
deterministic components (one predicting which sites are 'off limits', the
other predicting species abundance within the range or niche), ZIP/ZINB models
can be a logical choice for distribution modeling with response variables that
include information on absences and abundances.
The ability of the model to differentiate between the two types of
absences, however, depends on the quality of the data, so beware.

To
demonstrate the use of ZIP and ZAP models we'll add absences to the magnolia
dataset:

ro.plots
= unique(as.character(dat2$plotID)) #plots with mag fra

u.plots = unique(as.character(dat$plotID)) #all plots

nob.plots = u.plots[is.element(u.plots,ro.plots)==F] #plots without
mag fra

dat3 =
subset(dat,is.element(as.character(dat$plotID),nob.plots)) #dataset of no mag
fra

dat4 =
subset(dat3,duplicated(dat3$plotID)==F) #one row per plot

dat4$cover = 0 #cover
of red oak is zero in these plots

mf =
na.exclude(rbind(dat2,dat4)) #new dataframe of
presences and absences

We presume this new cover vector is highly overdispersed
in the sense of inflated zeros:

hist(mf$cover,nclass=13)

We'll start with a ZAP model. Note the
two different kinds of overdispersion we're potentially dealing with:
overdispersion in the zeros only, which the ZAP or ZIP models will take care
of; or overdispersion from both the zeros and the counts (e.g., if there were
also too many 1s for a Poisson process), which the NB variants of ZANB and ZINB
will handle. You likely won't know a
priori which of any of these would be the better choices (or just a NB model
alone without a binomial component), but we can compare each approach with
likelihood methods (e.g., AIC). Using
our above deterministic function, let's start with un-mixed models that involve
Poisson and NB error, knowing full well that the inflated zeros will favor a NB
stochastic function:

glm1 = glm(cover~elev+I(elev^2)+streamdist+elev*streamdist,family=poisson,data=mf)

library(MASS)

glm.nb1 = glm.nb(cover~elev+I(elev^2)+streamdist+elev*streamdist,data=mf) #negative binomial GLM from the MASS
library

AIC(glm1,glm.nb1)

df AIC

glm1 5 3456.315

glm.nb1 6 2847.853

No surprise there. Now we ask whether a mixed (ZAP) model that divides up the
data into two groups—zeros and nonzeros, fitting binomial error to the former
and Poisson error to the latter—is a better model even though it requires another
parameter. For this we use the hurdle
function in the pscl library:

install.packages("pscl")

library(pscl)

zap1 = hurdle(cover~elev|elev+I(elev^2)+streamdist+elev*streamdist,data=mf,dist="poisson",zero.dist="binomial",link="logit",control=hurdle.control(method="BFGS"))

summary(zap1)

Call:

hurdle(formula
= cover ~ elev | elev + I(elev^2) + streamdist + elev * streamdist, data = mf,
dist = "poisson", zero.dist = "binomial", link =
"logit", control = hurdle.control(method = "Nelder-Mead"))

Pearson
residuals:

Min
1Q Median 3Q
Max

-1.0436
-0.7001 -0.3469 0.4502 6.0524

Count
model coefficients (truncated poisson with log link):

Estimate Std. Error z value
Pr(>|z|)

(Intercept) 6.755e-01 1.051e-01
6.427 1.30e-10 ***

elev
4.651e-04 9.552e-05 4.869 1.12e-06 ***

Zero
hurdle model coefficients (binomial with logit link):

Estimate Std. Error z value
Pr(>|z|)

(Intercept)
-4.782e+00 2.150e-01 -22.24
<2e-16 ***

elev 1.231e-02 1.929e-04
63.82 <2e-16 ***

I(elev^2)
-6.950e-06 NA NA
NA

streamdist
-2.927e-03 1.479e-05 -197.84 <2e-16 ***

elev:streamdist 1.866e-06 NA
NA NA

---

Signif.
codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1

Number
of iterations in Nelder-Mead optimization: 362

Log-likelihood:
-1328 on 7 Df

Warning
message:

In
sqrt(diag(object$vcov)) : NaNs produced

The formula for **hurdle** can
take two components: the left side of the "|" is the model for the
count component, and the right side (without repeating the response variable)
is the model for the binomial predictor of absences. Including one formula component only means
the same deterministic function is used for both submodels. Here I made the presence-absence component
our above deterministic function, and the count (abundance) component a simple
function of elevation. The output shows
that the parameters vary strongly according to whether counts or absences are
being predicted (note the coefficients signs are often opposite because the
latter model is predicting absences, not presences). The model fitting procedure did not like the
second-order components for some reason, so we get the parameter values but not
any estimate of variance. Also note the **hurdle** function passes the hard work of
model fitting to optim, and you can access optim parameters (like method) using
the hurdle.control argument. Before
asking whether the ZAP model is better than the standard (un-mixed) NB model,
we can try the equivalent ZANB model:

zanb1 = hurdle(cover~elev|elev+I(elev^2)+streamdist+elev*streamdist,data=mf,dist="negbin",zero.dist="binomial",link="logit",control=hurdle.control(method="BFGS"))

AIC(glm.nb1,zap1,zanb1)

df AIC

glm.nb1 6 2847.853

zap1 7 2669.024

zanb1 8 2667.635

We gained a lot by using the ZA model structure, but the Poisson and NB
count models are similar. Of course, at this point you would want to compare
other possibilities for deterministic components of each submodel.

Finally,
we can compare the ZAP/ZANB approach to one that allows absences to enter both
submodels, the so-called 'mixture' models (ZIP and ZINB). The function for ZIP/ZINB is zeroinfl in the
same **pscl** package, and the formula
syntax for submodels is the same as for hurdle. Because only a binomial model
is used for the zeros-only component, you can specify the link but there is no *zero.dist *argument:

zip1 = zeroinfl(cover~elev|elev+I(elev^2)+streamdist+elev*streamdist,data=mf,dist="poisson",link="logit")

zinb1 = zeroinfl(cover~elev|elev+I(elev^2)+streamdist+elev*streamdist,data=mf,dist="negbin",link="logit")

AIC(zap1,zanb1,zip1,zinb1)

df AIC

zap1 7 2669.024

zanb1 8 2667.635

zip1 7 2667.738

zinb1 8 2666.023

Here we see that the ZINB model is almost a clear frontrunner, although not
clearly better than ZIP or ZANB models.
Although there is also no **anova**
(LRT) functionality within the **pscl**
package, the **lmtest** library contains
the **lrtest** function that is able to
extract –LL and df from nested models (like Poisson and NB) for the chi-squared
test:

install.packages("lmtest")

library(lmtest)

lrtest(zip1,zinb1)

Likelihood ratio test

Model
1: cover ~ elev | elev + I(elev^2) + streamdist + elev * streamdist

Model
2: cover ~ elev | elev + I(elev^2) + streamdist + elev * streamdist

#Df
LogLik Df Chisq
Pr(>Chisq)

1 7 -1326.9

2 8 -1325.0
1 3.7147 0.05394 .

As a way to compare how the presence-absence vs.
abundance models affect the overall prediction of magnolia abundance, we can
use the **predict** function with
special arguments for **zeroinfl**
objects that specify output of either submodel or the full composite model, and
graph the predictions for one or more predictor variables. I'll use predictions over an elevation gradient
and will fix the stream distance input at 100 m:

x = seq(0,2000) #new
elevation input

y.absence =
predict(zinb1,newdata=list(elev=x,streamdist=rep(100,length(x))),type="zero") #predictions for the bimomial submodel

y.count = predict(zinb1,newdata=list(elev=x,streamdist=rep(100,length(x))),type="count") #predictions for the NB submodel

y.all =
predict(zinb1,newdata=list(elev=x,streamdist=rep(100,length(x))),type="response") #predictions for the full model

par(mfrow=c(3,1)) #three-panel graph

plot(x,y.all,xlab="Elevation",ylab="Predicted abundance")

plot(x,1-y.absence,xlab="Elevation",ylab="Predicted
occurrence") #note 1-y.absence
predicts occurrences rather than absences

plot(x,y.count, xlab="Elevation",ylab="Predicted
abundance")

With so many zeros dominating the cover vector, it is not surprising that the presence-absence model dominates the overall predicted cover response. Our deterministic model for abundance given occurrence is simply a monotonically increasing function of elevation because elevation was our only predictor. Clearly this is just the start of such an exercise; ZIP and ZINB models allow for complex sets of hypotheses involving factors that influence whether a species occurs in a location versus how abundant it gets once it's there, using up a minimal amount of degrees of freedom.