Generalized linear
models
Generalized linear models (GLMs) extend the linear modeling capability of R to scenarios that involve nonnormal error distributions or heteroscedasticity. All other classic assumptions (particularly independent observations) still apply. The idea here is that linear functions of the predictor variables are obtained by transforming the right side of the equation ( f(x) ) by a link function. The data are then fit in this transformed scale (using an iterative routine based on least squares), but the expected variance is calculated on the original scale of the predictor variables. Simple examples of link functions are log(y) [which linearizes exp(x)], sqrt(y) [x^2], and 1/y [1/x]. More particularly, GLMs work for the socalled 'exponential' family of error models: Poisson, binomial, Gamma, and normal. Each of these has a standard (default) link function:
Error family Default
link Inverse of link Use for:
gaussian identity 1 normally distributed error
poisson log exp counts (many zeros, various integers)
binomial logit 1/(1+1/exp(x)) proportions or binary (0,1) data
Gamma inverse 1/x continuous data with nonconstant error (constant CV)
You don't have to use the default link. For example, for binary data (e.g., presenceabsence data) it may be more appropriate to use the 'complementary loglog link function, specified as family=binomial(link="cloglog"). The link transformations are particularly handy when it comes to model comparisons, because the GLM approach produces a response deviance on the same scale for all models (e.g., AICs are comparable and likelihood ratio tests are appropriate for nested models). This allows for flexibility in exploring which link function is most appropriate: take the one with the lowest residual deviance (2*logLikelihood).
Note that parameter values returned by coef() or summary() are on the scale of the link function, not the untransformed predictor variables. Use the inverse of the link function to get parameter values back on the scale of x, or use the function predict with the type="response" argument.
Count (or countlike)
response variables
We'll first try some examples for using glm given count data. Strictly speaking, the provided tree dataset does not include a vector of counts, but the abundance data, measured as cover classes (integers between 1 and 10), may fit the essential criteria of 1) being discrete, and 2) having variance that generally increases with the mean (you can consider this from first principles: if a species had a mean abundance around 1, the variance has to be low because you can't get any lower than this given our data; a mean of 5, however, could have high variance). A Poisson process is one whose mean and variance are equal; we can check to see whether this is true for our data. Taking eastern hemlock as an example:
dat = read.csv(file.choose())
dat2
= subset(dat,dat$species=="Tsuga canadensis")
mean(dat2$cover)
[1]
4.659517
var(dat2$cover)
[1]
4.471835
Not perfect but close. Let's look at the distribution of counts themselves:
table(dat2$cover)
1 2
3 4 5
6 7 8
9 10
39 71 166 110
92 80 108 60 19 1
If these counts were distributed exactly from a Poisson process, what would they look like, assuming the same mean (and variance)?
bar1
= barplot(as.vector(table(dat2$cover)),names.arg=seq(1:10))
points(bar1,dpois(seq(1,10),4.66)*sum(table(dat2$cover)),
cex=2,type="b",col="sienna",lwd=2,pch=19)
A couple things about what we did here: rather than use hist, I plotted a histogram using the table function output, because I wanted to see exactly the number of counts in each bin from 1 to 10 (this can be frustrating to do exactly with hist: try it). Then I overlayed the expected values from a Poisson distribution with the same mean (=4.66) using the dpois function. Because dpois returns a probability density, I had to multiply these values by the overall count sum to translate them into the expected frequencies we see on the bar chart. As you can see, the data look Poissonish but they're not perfect. (One reason you've probably already guessed: our data are bounded at 10, so variance should actually go down for the highest values.)
Now let's fit a GLM to these data with just an intercept (overall mean):
glm1
= glm(cover~1,data=dat2,family=poisson)
summary(glm1)
Call:
glm(formula
= cover ~ 1, family = poisson, data = dat2)
Deviance
Residuals:
Min
1Q Median 3Q
Max
2.0594 0.8229
0.3132 1.0085 2.1430
Coefficients:
Estimate Std. Error z value
Pr(>z)
(Intercept)
1.53891 0.01696 90.73
<2e16 ***

Signif. codes: 0
*** 0.001 ** 0.01 * 0.05 . 0.1 1
(Dispersion
parameter for poisson family taken to be 1)
Null deviance: 749.25 on 745
degrees of freedom
Residual
deviance: 749.25 on 745 degrees of freedom
AIC:
3212.2
Number of
Fisher Scoring iterations: 4
The output is similar to lm output, and the standard summary and other attribute functions (coef, confint, resid, fitted, etc) apply. One major difference is how the fitted coefficients are scaled: we were expecting a mean of 4.66 (calculated above), but instead got back 1.54. Here and in all other cases where the link is not the identity function, the fitted coefficients are returned on the scale of the link function, not the scale of the original data. To get back the appropriately scaled coefficient we apply the inverse of the link function, in this case exp (as the default link for Poisson is log):
exp(coef(glm1))
(Intercept)
4.659517
Let's add a continuous predictor variable like elevation to generate a simple Poisson regression. First we'll graph it:
with(dat2,plot(elev,cover,main="Hemlock cover vs. elevation", cex=1.5,col="cyan",pch=19))
And then we'll fit the new glm and test it against a model with only an intercept:
glm2
= glm(cover~elev,data=dat2,family=poisson)
anova(glm1,glm2,test="Chisq")
Analysis
of Deviance Table
Model 1:
cover ~ 1
Model 2:
cover ~ elev
Resid. Df Resid. Dev Df Deviance P(>Chi)
1 745
749.25
2 744
749.23 1 0.023849 0.8773
Note the chisquared test is typically recommended for models with 'known deviance' (Poisson and binomial). Here the model with elevation adds no explanatory power (fairly obvious from the graph), but we can still add the predicted trend line to our graph:
x =
seq(0,1660)
lines(predict(glm2,list(elev=x),type="response"),lwd=2,col="orange")
What is crucial here is the type argument to predict: "response" recalculates the coefficients to be on the same scale as the original response variable, rather than the scale of the link function. Let's now try an ANOVA with Poisson error, using disturbance as our predictor:
glm3
= glm(cover~disturb,data=dat2,family=poisson)
summary(glm3)
Call:
glm(formula
= cover ~ disturb, family = poisson, data = dat2)
Deviance
Residuals:
Min
1Q Median 3Q
Max
2.1794 0.7763
0.1980 0.8523 2.2006
Coefficients:
Estimate Std. Error z value
Pr(>z)
(Intercept)
1.48367 0.03838 38.661
<2e16 ***
disturbLTSEL
0.03204 0.04685 0.684
0.4941
disturbSETTLE
0.08957 0.05485 1.633
0.1025
disturbVIRGIN 0.12184
0.05277 2.309 0.0210 *

Signif.
codes: 0 *** 0.001 ** 0.01 * 0.05
. 0.1 1
(Dispersion
parameter for poisson family taken to be 1)
Null deviance: 749.25 on 745
degrees of freedom
Residual
deviance: 742.32 on 742 degrees of freedom
AIC:
3211.3
Number of
Fisher Scoring iterations: 4
anova(glm1,glm3,test="Chisq")
Analysis
of Deviance Table
Model 1:
cover ~ 1
Model 2:
cover ~ disturb
Resid. Df Resid. Dev Df Deviance
P(>Chi)
1 745
749.25
2 742
742.32 3 6.926
0.0743 .

Signif.
codes: 0 *** 0.001 ** 0.01 * 0.05
. 0.1 1
There is minor support for the disturbance variable. The glm coefficient table works just like the summary for ANOVA produced by lm: the level alphabetically first (CORPLOG) is used as an intercept, and all subsequent levels are tested against the first (thus the remaining coefficients are differences from the mean, not means themselves). There appears to be some support for higher hemlock cover in 'virgin' forest, and the overall model is of borderline significance (AICs are also similar). Would it help to add an interaction with elevation?
glm4
= glm(cover~disturb*elev,data=dat2,family=poisson)
summary(glm4)
Call:
glm(formula
= cover ~ disturb * elev, family = poisson, data = dat2)
Deviance
Residuals:
Min 1Q Median
3Q Max
2.4042 0.7782
0.2072 0.8090 2.0888
Coefficients:
Estimate Std. Error z
value Pr(>z)
(Intercept)
1.445e+00 1.396e01 10.352
<2e16 ***
disturbLTSEL
2.546e01 1.639e01 1.554
0.1203
disturbSETTLE
3.702e01 2.275e01 1.628
0.1036
disturbVIRGIN
9.283e02 2.625e01 0.354
0.7236
elev
3.540e05 1.243e04 0.285
0.7758
disturbLTSEL:elev 2.788e04 1.651e04
1.689 0.0912 .
disturbSETTLE:elev
7.319e04 2.943e04 2.487
0.0129 *
disturbVIRGIN:elev
2.278e05 2.269e04 0.100
0.9200

Signif.
codes: 0 *** 0.001 ** 0.01 * 0.05
. 0.1 1
(Dispersion
parameter for poisson family taken to be 1)
Null deviance: 749.25 on 745
degrees of freedom
Residual
deviance: 729.06 on 738 degrees of freedom
AIC: 3206
Number of
Fisher Scoring iterations: 4
step(glm4)
Start: AIC=3206
cover ~
disturb * elev
Df Deviance AIC
<none> 729.06 3206

disturb:elev 3 742.06 3213
Call: glm(formula = cover ~ disturb * elev, family
= poisson, data = dat2)
Coefficients:
(Intercept)
disturbLTSEL
disturbSETTLE disturbVIRGIN elev
disturbLTSEL:elev
1.445e+00 2.546e01 3.702e01 9.283e02 3.540e05 2.788e04
disturbSETTLE:elev disturbVIRGIN:elev
7.319e04 2.278e05
Degrees
of Freedom: 745 Total (i.e. Null); 738
Residual
Null
Deviance: 749.2
Residual
Deviance: 729.1 AIC: 3206
The ANOVA contrasts suggest a significant difference in slope with elevation in the plots of prior settlement; the step function shows that the interaction is necessary (large increase in AIC when the interaction is removed).
GLM extensions for
overdispersed (discrete) data
The above exponential family of error functions cannot account for the case of data overdispersion, where the variance is higher than expected from a Poisson or binomial processes. To account for these cases glm includes 'quasi' exponential functions that add the parameter phi (φ) to the expected variance equation (Poisson example: variance as φλ rather than λ). The expectation is that the fit of φ will be greater than 1 (<1 is underdispersion). The 'quasipoisson' family object can thus account for count data with overdispersion (e.g., many zeros but also many very high values), and the 'quasibinomial' family object can do the same for binomial data. The fitted φ value is returned in the summary. One sign of overdispersion is that the residual deviance is significantly higher than the residual degrees of freedom. That's not true of our glm4 fitted model, but we'll see what happens with a quasipoisson fit anyway:
glm4.qp = update(glm4,family=quasipoisson)
summary(glm4.qp)
Call:
glm(formula
= cover ~ disturb * elev, family = quasipoisson,
data = dat2)
Deviance
Residuals:
Min
1Q Median 3Q
Max
2.4042 0.7782
0.2072 0.8090 2.0888
Coefficients:
Estimate Std. Error t
value Pr(>t)
(Intercept)
1.445e+00 1.357e01 10.649
<2e16 ***
disturbLTSEL
2.546e01 1.593e01 1.598
0.1104
disturbSETTLE
3.702e01 2.211e01 1.674
0.0945 .
disturbVIRGIN
9.283e02 2.552e01 0.364
0.7161
elev
3.540e05 1.208e04 0.293
0.7697
disturbLTSEL:elev 2.788e04 1.605e04
1.738 0.0827 .
disturbSETTLE:elev
7.319e04 2.861e04 2.558
0.0107 *
disturbVIRGIN:elev
2.278e05 2.205e04 0.103
0.9178

Signif.
codes: 0 *** 0.001 ** 0.01 * 0.05
. 0.1 1
(Dispersion
parameter for quasipoisson family taken to be 0.9449323)
Null deviance: 749.25 on 745
degrees of freedom
Residual
deviance: 729.06 on 738 degrees of freedom
AIC: NA
Number of
Fisher Scoring iterations: 4
In our case this looks to be a waste of time, since the
overdispersion parameter was fit at <1.
Note these error objects are not fit with actual likelihoods, so the
standard likelihood ratio test for comparing nested models is not
available.
Another, often better way of dealing with overdispersion that retains the nice characteristics of likelihood (AICs, likelihood ratio tests, use of step or stepAIC) is using a negative binomial (NB) model. NB models are handy for discrete data like count data, but where the variance increases much faster than the mean (as opposed to the Poisson, where they are equal). Where the Poisson model has one parameter (lambda = mean = var), NB contains an additional parameter k that accounts for 'clumping'particularly handy for count data where there are a preponderance of zeros. NB is not accommodated in the base glm function (it is not part of the exponential family of distributions) but the MASS library includes the otherwiseidentical glm.nb function. Here's a try fitting a NB model to our standard cover data:
library(MASS)
nb1 =
glm.nb(cover~disturb*elev,data=dat2)
Warning
messages:
1: In
theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace = control$trace
> :
iteration limit reached
2: In
theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace = control$trace
> :
iteration limit reached
>
summary(nb1)
Call:
glm.nb(formula
= cover ~ disturb * elev, data = dat2, init.theta = 48556.6389,
link = log)
Deviance
Residuals:
Min
1Q Median 3Q
Max
2.4041 0.7781
0.2072 0.8089 2.0887
Coefficients:
Estimate Std. Error z
value Pr(>z)
(Intercept)
1.445e+00 1.396e01 10.351
<2e16 ***
disturbLTSEL
2.546e01 1.639e01 1.553
0.1203
disturbSETTLE
3.702e01 2.275e01 1.627
0.1036
disturbVIRGIN
9.283e02 2.625e01 0.354
0.7236
elev
3.540e05 1.243e04 0.285
0.7758
disturbLTSEL:elev 2.788e04 1.651e04
1.689 0.0912 .
disturbSETTLE:elev
7.319e04 2.943e04 2.487
0.0129 *
disturbVIRGIN:elev
2.278e05 2.269e04 0.100
0.9200

Signif.
codes: 0 *** 0.001 ** 0.01 * 0.05
. 0.1 1
(Dispersion
parameter for Negative Binomial(48556.64) family taken to be 1)
Null deviance: 749.18 on 745
degrees of freedom
Residual
deviance: 728.99 on 738 degrees of freedom
AIC: 3208
Number of
Fisher Scoring iterations: 1
Theta: 48557
Std. Err.: 373192
Warning
while fitting theta: iteration limit reached
2 x loglikelihood: 3190.003
The overdispersion parameter is called 'theta' here (=k). We should be a bit wary of the results since the fit did not meet the default convergence limits. Nonetheless, the fit is very similar to our Poisson model (glm4), and unlike the quasipoisson fit we can compare the models via AIC:
AIC(glm4,nb1)
df
AIC
glm4 8 3205.998
nb1 9 3399.000
There is no support for a NB model here. Given the glm4 model did not suggest overdispersion, this makes sense. But what if we allowed absences (zero abundance) in our model of hemlock abundance? Surely there must be hundreds of plots that don't contain hemlock, and this might drastically change our model based on Poisson assumptions. We can start by creating a new dataset, where plots that do not contain hemlock are added in with zero cover values:
h.plots = unique(as.character(dat2$plotID))
u.plots = unique(as.character(dat$plotID))
noh.plots = u.plots[is.element(u.plots,h.plots)==F]
The first line is a vector of plot IDs containing hemlock, the second is a vector of all the plots, and the third vector is all plots that do not contain hemlock. Now we create a data frame of these plots with unique plotIDs (ignoring species):
dat3 =
subset(dat,is.element(as.character(dat$plotID),noh.plots))
dat4 = subset(dat3,duplicated(dat3$plotID)==F)
And finally we add these rows to dat2 (containing hemlock) after changing the cover values to zero:
dat4$cover = 0
dat5 = rbind(dat2,dat4)
Now let's look at our cover distribution:
bar1
= barplot(as.vector(table(dat5$cover)),names.arg=as.character(seq(0,10)))
points(bar1,dpois(seq(0,10),mean(dat5$cover))*sum(table(dat5$cover)),
cex=2,type="b",col="lightblue",lwd=2,pch=19)
points(bar1,dnbinom(seq(0,10),mu=mean(dat5$cover),
size=(mean(dat5$cover)^2)/(var(dat5$cover) mean(dat5$cover)))*sum(table(dat5$cover)),
type="b",cex=2,col="salmon",lwd=2,pch=19)
The Poisson curve (blue) doesn't cut it, and note that that variance of the counts is now nearly twice that of the mean. The orangish curve here is that fit by the builtin dnbinom function, with specified values of the mean (calculated from the cover vector) and the overdispersion parameter k, which is defined as the mean squared divided by (variance mean). We still have way too many zeros even for NB, but is NB a better model here?
nb2 =
glm.nb(cover~disturb*elev,data=dat5)
glm5
= glm(cover~disturb*elev,data=dat5,family=poisson)
AIC(nb2,glm5)
df
AIC
nb2 9 4788.591
glm5 8 5287.842
No contest. But we still have a major problem in that our plot diagnostics (and the residual deviance compared to residual df) suggests that we've only partly dealt with overdispersion:
plot(nb2)
It's no surprise that the lefthand tail of our residuals is
significantly higher than expected. This
brings up an important note about overdispersion (mentioned by
Occupancy (presenceabsence) data involve a response
variable defined by one of two states; in statistical parlance this is a
Bernoulli trial (heads or tails), or a binomial process where N=1. In this case we're interested in the
probability of 'success' (presence) given values of one or more independent
variables. In R this is done via a glm
with family=binomial, with the link function either taken as the default
(link="logit") or the userspecified 'complementary loglog'
(link="cloglog").
Let's take our overdispersed hemlock count data and covert all abundances to 1, thereby creating a presenceabsence vector:
dat5$pa = dat5$cover
dat5$pa[dat5$pa>0] = 1
We've defined a new column in dataset 'dat5' named 'pa', containing only 1s and 0s for presences and absences. Let's pick up our above thread and fit an ANCOVA with disturbance class and elevation:
bin.glm1 = glm(pa~disturb*elev,data=dat5,family=binomial)
summary(bin.glm1)
Call:
glm(formula
= pa ~ disturb * elev, family = binomial, data = dat5)
Deviance
Residuals:
Min
1Q Median 3Q
Max
2.6619 0.6417
0.5740 0.7278 1.8803
Coefficients:
Estimate Std. Error z
value Pr(>z)
(Intercept)
2.4435687 0.5939061 4.114 3.88e05 ***
disturbLTSEL
1.2989502 0.7147168 1.817
0.06915 .
disturbSETTLE
0.7848815 0.8840129 0.888
0.37461
disturbVIRGIN 7.0985933 1.2561157
5.651 1.59e08 ***
elev 0.0013617 0.0004864
2.800 0.00511 **
disturbLTSEL:elev
0.0016463 0.0006325 2.603
0.00924 **
disturbSETTLE:elev 0.0007611
0.0010745 0.708 0.47875
disturbVIRGIN:elev
0.0054999 0.0009433 5.830 5.53e09 ***

Signif.
codes: 0 *** 0.001 ** 0.01 * 0.05
. 0.1 1
(Dispersion
parameter for binomial family taken to be 1)
Null deviance: 1307.5 on 1067
degrees of freedom
Residual
deviance: 1063.7 on 1060 degrees of freedom
AIC:
1079.7
Number of Fisher Scoring iterations: 5
Plenty of evidence that both factors matter. Let's test this model with sequential removal of terms:
step(bin.glm1)
Start: AIC=1079.67
pa ~ disturb
* elev
Df Deviance AIC
<none> 1063.7 1079.7

disturb:elev 3 1113.0 1123.0
Call: glm(formula = pa ~ disturb * elev, family =
binomial, data = dat5)
Coefficients:
(Intercept) disturbLTSEL disturbSETTLE disturbVIRGIN elev
2.4435687 1.2989502 0.7848815 7.0985933 0.0013617
disturbLTSEL:elev disturbSETTLE:elev disturbVIRGIN:elev
0.0016463 0.0007611 0.0054999
Degrees
of Freedom: 1067 Total (i.e. Null); 1060
Residual
Null
Deviance: 1308
Residual Deviance: 1064 AIC: 1080
Strong support for including the interaction. Now let's make sure the logit link function is gives the lowest deviance.
bin.glm2 = update(bin.glm1,family=binomial(link="cloglog"))
anova(bin.glm1,bin.glm2)
Analysis
of Deviance Table
Model 1:
pa ~ disturb * elev
Model 2:
pa ~ disturb * elev
Resid. Df Resid. Dev Df Deviance
1 1060
1063.7
2 1060 1086.0 0 22.346
The default logit link is much better (measured as lowest deviance; because the same number of parameters are used in both models, this is equivalent to comparing their AICs). Let's see what the fitted model looks like on an occupancy graph against elevation for different subsets of disturbance classes. We'll do this separately by disturbance type:
x =
seq(250,2000)
with(dat5[dat5$disturb=="VIRGIN",],plot(elev,pa,cex=1.5,col="darkgreen",
xlim=c(250,2000)))
lines(x,predict(bin.glm1,newdata=list(elev=x,disturb=factor(rep("VIRGIN",length(x)))),
This is the predicted logistic curve for the elevation response in VIRGIN stands. We first made a new x axis vector, then plotted the presenceelevation data for a hemlock dataset restricted to VIRGIN plots, and then overlayed the predicted values from our fitted bin.glm1 model at each x value, but only using the coefficients specific to VIRGIN plots. Don't forget that the 'response' type argument ensures our fitted values are on the scale of the original data rather than the logit link function. Let's compare this distribution to that for CORPLOG plots:
with(dat5[dat5$disturb=="CORPLOG",],points(elev,pa+.02,cex=1.5,col="blue"))
lines(x,predict(bin.glm1,newdata=list(elev=x,disturb=factor(rep("CORPLOG",length(x)))),
Apparently a history of heavy logging has blurred the relationship between hemlock presence and elevation. Does this remain true with a history of selective cutting?
with(dat5[dat5$disturb=="LTSEL",],points(elev,pa.02,cex=1.5,col="darkred"))
lines(x,predict(bin.glm1,newdata=list(elev=x,disturb=factor(rep("LTSEL",length(x)))),
type="response"),lwd=2,col="darkred")
The response is intermediate, which seems intuitive.
A note on proportion
data
Although not a common case with species distribution data, you may run into a dependent variable where observations are actually proportions derived from a set of trials of a certain number of successes and failuresi.e., a binomial process where N>1. You could simply model this as percentages (a continuous variable bounded by 0 and 1), which would be an appropriate case of using glm with family=binomial with the default logit link. But if your observations differed in number of trials, you'd be throwing away the N associated with each datum. Observations derived from more samples would seem intuitively more important in the regression than those from small samples, even though their proportions may be the same. R can handle this using glm with the binomial(link="logit") family, with a dependent variable that is actually a twovector object, the first being the number of 'successes' and the second the number of 'failures'.
One application of this to species distribution data is when surveys include differential survey effort per observation. For example, in tree inventories, plots may vary in the total number of trees (individuals) sampled due to differences in quadrat size (or naturally due to environmental or historical factors). If you have information on the total number of samples per site and similar measure of abundance for individual species, you can model abundance as a binomial process of # 'successes' (individuals per species) and # 'failures' (individuals that were other species). Essentially R will then treat this as a weighted regression, with sites of more individuals influencing parameter estimates more than small samples.
Here's an example by modeling hemlock cover with respect to total cover. The example is a bit forced (as these are not actual samples of individuals) but you get the idea. We start by generating the summed cover of all species for each plot:
sumcover = tapply(dat$cover,dat$plotID,sum)
coverdat = data.frame(names(sumcover),sumcover)
Now we merge these data with our dat5 hemlock object, based on plotID:
dat6 = merge(dat5,coverdat,all.x=T,by.x=1,by.y=1)
And create our response variables of hemlock cover (successes) and total cover minus hemlock cover (failures):
cover.y = cbind(dat6$cover,dat6$sumcoverdat6$cover)
We can now model these data as a binomial process of varying number of trials per observation:
glm6
= glm(cover.y~disturb*elev,data=dat6,family=binomial)
summary(glm6)
Call:
glm(formula
= cover.y ~ disturb * elev, family = binomial, data = dat6)
Deviance
Residuals:
Min
1Q Median 3Q
Max
5.1115 1.4730
0.4089 1.4116 5.1429
Coefficients:
Estimate Std. Error z
value Pr(>z)
(Intercept)
2.528e+00 1.385e01 18.258 < 2e16 ***
disturbLTSEL
2.766e01 1.625e01 1.702 0.088828 .
disturbSETTLE
2.065e+00 2.217e01 9.312
< 2e16 ***
disturbVIRGIN
3.722e01 2.217e01 1.679 0.093245 .
elev 8.036e05 1.228e04
0.655 0.512737
disturbLTSEL:elev 6.051e04 1.628e04
3.716 0.000202 ***
disturbSETTLE:elev
2.941e03 2.882e04 10.207
< 2e16 ***
disturbVIRGIN:elev 2.089e04 1.908e04
1.095 0.273590

Signif. codes: 0
*** 0.001 ** 0.01 * 0.05 . 0.1 1
(Dispersion
parameter for binomial family taken to be 1)
Null deviance: 4068 on 1067
degrees of freedom
Residual
deviance: 3866 on 1060 degrees of freedom
AIC:
6241.6
Number of
Fisher Scoring iterations: 5
Note coefficient values are on the logittransformed scale, as above. You can also get overdispersion with proportion data. The family for this is quasibinomial, and it works just like quasipoisson.
Note that there will be occasions where you may wonder whether it makes more sense to model observations as binary (0, 1s) or lump them to form counts summarized by particular factors (e.g., 47 presences, 80 absences in a particular disturbance history class). If your predictor variables are factor variables that are common to all occurrences, it may make more sense to model these as proportion data based on counts. But if each occurrence has a unique value of a continuous variable (e.g., in an ANCOVA with both factor and continuous variables), a binary response is logical.
Two more points on
when and when not to use GLMs with species distribution data:
1). If you have abundance measured in percent cover (where
sample effort is equal for each observation),
2). Because the use of link functions, even using the
Gaussian (normal) family, can incorporate transformations of the response
variable (such as log(y)), one might wonder whether it makes more sense to use lm with log(y) or glm with link="log".
This is apparently something of an art, but