**General linear models
(least squares) in R
**

Bolker
(Chap. 9) provides a nice summary of *general*
linear models as opposed to the more modern techniques. Essentially general linear models (NOT
general*ized* linear models) are the
old-school models of normal residual distributions, independent observations, homoscedasticity, and (assumed) lack of any observation
error. In R this includes everything
that the **lm** function does (simple
and multiple least-squares regression), ANOVA, and ANCOVA. The only difference between these is whether
the model includes only continuous variables (regression), only factor
variables (ANOVA), or both (ANCOVA).

All modeling approaches in R use the same basic structure of "response ~ explanatory variable(s)". Here's an example of a regression model of the abundance (cover) of beech on elevation:

dat = read.csv(file.choose())

lm1 =
lm(cover~elev,data=dat,subset=species=="Fagus grandifolia")

This simple linear regression fits both an intercept and a
slope. A summary of the fitted model
object is available with the **summary**
function:

summary(lm1)

Call:

lm(formula
= cover ~ elev, data = dat,
subset = species == "Fagus grandifolia")

Residuals:

Min
1Q Median 3Q
Max

-3.9054
-1.8398 -0.1664 1.6433 5.4512

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 1.9620373
0.3437750 5.707 2.21e-08 ***

elev
0.0019596 0.0002971 6.596 1.31e-10 ***

---

Signif. codes: 0
‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual
standard error: 2.1 on 409 degrees of freedom

Multiple
R-squared: 0.09613, Adjusted
R-squared: 0.09393

F-statistic: 43.5 on 1 and 409 DF, p-value: 1.313e-10

All fitted model objects can be explored with **summary**. For **lm**,
you get a restatement of the model, the range and quantile
estimates of the model residuals, a table of fitted coefficients along with
their precision (SE) and significance (t tests), the residual SE (square root
of the error variance), the R2 or coefficient of determination (fraction of
variance explained), the adjusted-R2 (applies a penalty for more fitted parameters),
and the associated F statistic evaluating the significance of the full
model. There are many attributes of lm1,
most of which are not shown. For
example, you can retrieve a vector of fitted (predicted) values with lm1$fitted
(or **fitted**(lm1)),
a vector of residuals with lm1$resid (or **resid**(lm1)), and return the
coefficients with **coef**. Alternatively, note that this summary is
itself an object, with components that can be accessed with indexing. For example:

summary(lm1)[[4]]

Estimate Std. Error t value Pr(>|t|)

(Intercept)
1.962037256 0.3437749892 5.707330 2.206867e-08

elev
0.001959647 0.0002971168 6.595544 1.312590e-10

returns the 4^{th} element of
summary(lm1), which is stored as a list.
Here this element is the coefficient table. We could also show these results as an
ANOVA-type table:

summary.aov(lm1)

Df
Sum Sq Mean Sq F value
Pr(>F)

elev
1 191.76 191.76 43.501 1.313e-10 ***

Residuals 409 1802.91 4.41

In
fact, all **lm** fitted objects can be
returned in both summary forms (**summary.lm** and **summary.aov**). Other
helpful related items are a report of the confidence intervals of each
parameter:

confint(lm1)

2.5 % 97.5 %

(Intercept)
1.28625089 2.637823622

elev
0.00137558 0.002543713

and associated AIC:

AIC(lm1)

[1] 1780.058

Don't
forget that AIC values are particular to a given response vector: a change in
any of the values of the response (e.g., transformation, removing an outlier,
etc.) results in a non-comparable vector.

Is our model a good model? One easy way to examine model results is with
the plot function:

plot(lm1)

Here
you get four graphs (click to go from one panel to the next): 1) residuals vs.
fitted values (a good model will show no pattern); 2) the qqnorm
plot we saw above (values should be on the dashed line); 3) scale-location
graph, indicating heteroscedasticity ; and 4)
standardized residuals vs. leverage and Cook's distance, which is handy for
identifying outliers. The first thing we
notice in our data is that the y data are not continuous (thus the
cross-hatched patterns in residual graphs) and that the residuals are probably
not normally distributed (try **shapiro.test**(lm1$resid) to confirm).
Would square root transformation help?
We could compare the fit of a new model (lm2) with a transformed cover
variable, by either calling lm again with new arguments or using the update
function:

lm2 = update(lm1,formula=sqrt(cover)~elev)

plot(lm2)

The
residuals don't look any better and the R2 has gone down. To do better we'll have to move to a
generalized linear model (**glm**).

Now we'll try an ANOVA. Let's ask whether beech is differentially
sensitive to disturbance history:

aov1 = lm(cover~disturb,data=dat,subset=species=="Fagus grandifolia")

summary(aov1)

Call:

lm(formula
= cover ~ disturb, data = dat, subset = species ==

"Fagus grandifolia")

Residuals:

Min
1Q Median 3Q Max

-3.333 -2.053 -0.229 1.947 5.771

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept)
4.2290 0.1920 22.030
<2e-16 ***

disturbLT-SEL
0.1043 0.2759 0.378
0.7055

disturbSETTLE
-0.8199 0.3828 -2.142
0.0328 *

disturbVIRGIN -0.1759 0.2821 -0.624
0.5332

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual
standard error: 2.197 on 407 degrees of freedom

Multiple
R-squared: 0.01499, Adjusted
R-squared: 0.007725

F-statistic:
2.064 on 3 and 407 DF,
p-value: 0.1044

This
output is more confusing than regression output; for one thing coefficients are
only shown for 3 of four levels of disturbance, plus the overall
intercept. This is because the default
contrast structure for ANOVA in R is to use 'treatment contrasts' and ignore a
raw intercept. The factor level that is
alphabetically first (here, "disturbCORPLOG")
is used as the overall intercept, so this estimate is used to test whether the
overall mean differs from zero (here, it clearly does, because there are no
zero cover values in our dataset). Each
remaining factor level is then tested against this first level; i.e., the
second parameter estimate (here next to "disturbLT-SEL")
is the difference in cover of beech in CORPLOG and LT-SEL plots. So what this output is showing us is that
SETTLE plots have significantly lower abundances of beech than CORPLOG plots
(by -.8 cover units). The first row
coefficient is an actual mean of a sample (mean beech cover of all CORPLOG
plots), while the remaining coefficients are differences in mean cover, not
actual means themselves. These are of
course only a few of all possible contrasts; you can specify a particular
contrast structure with the function **contrasts**.

We could optionally choose to have R
not fit an intercept at all, but instead determine whether the mean cover of
each group is significantly different from zero. In our case this is a bit silly, but it is
easily accomplished using '-1' in on right side of the equation:

aov2 = lm(cover~disturb-1,data=dat,subset=species=="Fagus
grandifolia")

A
call to summary(aov2) will confirm that indeed cover
is greater than zero for all groups. Not
fitting an intercept is usually more handy in
regressions when we want the line to go through the origin.

Finally, note that the **lm**
output for our model is different than that produced by **aov**:

summary.aov(aov1)

Df
Sum Sq Mean Sq F value Pr(>F)

disturb 3
29.89 9.9635 2.0639 0.1044

Residuals 407 1964.78 4.8275

You
can see why from the summary table: the **lm** output produces a set of
treatment contrasts (the mean of each group), each
tested with pairwise t tests. But here **aov** produces the overall ANOVA result, which is a
simple comparison of variances of the cover response variable between and
within the grouping variable; comparisons of two variances are done via the F
test. The fact that these tests give us
different results suggests than in fact the difference between groups is an
issue of variance as well as the mean:

plot(dat$disturb[dat$species=="Fagus grandifolia"],dat$cover[dat$species=="Fagus
grandifolia"],notch=T)

The
problem is evident here: a general linear model is not a good choice because of
heteroscedasticity (SETTLE variance is much
lower). This would be a good occasion to
use a bootstrapped model (like a permutational ANOVA)
rather than something based on classical assumptions.

Multiple comparisons can be
accomplished with various other tests in R, such as the popular Tukey honest
significant difference (HSD) test. The
argument for the TukeyHSD function must be an **aov** object rather than a **lm**
one:

TukeyHSD(aov(aov1))

Tukey multiple comparisons of means

95% family-wise confidence level

Fit: aov(formula
= aov1)

$disturb

diff lwr upr p adj

LT-SEL-CORPLOG 0.1043257
-0.6073034 0.81595478 0.9815546

SETTLE-CORPLOG
-0.8199167 -1.8075190 0.16768552 0.1417827

VIRGIN-CORPLOG
-0.1759103 -0.9035981 0.55177748 0.9244785

SETTLE-LT-SEL -0.9242424
-1.9198871 0.07140225 0.0796909

VIRGIN-LT-SEL -0.2802360
-1.0188019 0.45832992 0.7616165

VIRGIN-SETTLE 0.6440064 -0.3631787 1.65119155 0.3521145

Unlike
the t tests produced by the **lm** summary, the more conservative Tukey HSD
tests suggest that beech cover does not vary between any two disturbance
classes.

We have already noted that plots of different
disturbance classes have a very different elevation distribution. We can determine whether elevation influences
the above ANOVA results (or equivalently, whether the elevation effect varies
by disturbance class) with an ANCOVA:

ancova1 = lm(cover~disturb*elev,data=dat,subset=species=="Fagus
grandifolia")

summary(ancova1)

Call:

lm(formula
= cover ~ disturb * elev, data = dat,
subset = species ==

"Fagus grandifolia")

Residuals:

Min
1Q Median 3Q
Max

-4.3054 -1.5585
-0.1539 1.7687 5.3592

Coefficients:

Estimate Std. Error t
value Pr(>|t|)

(Intercept)
1.517e+00 6.635e-01 2.287
0.0227 *

disturbLT-SEL
3.248e-01 8.653e-01 0.375
0.7076

disturbSETTLE
1.688e+00 1.546e+00 1.092
0.2754

disturbVIRGIN
2.092e-01 1.338e+00 0.156
0.8758

elev
2.340e-03 5.503e-04 4.252 2.64e-05 ***

disturbLT-SEL:elev
2.112e-05 7.402e-04 0.029
0.9772

disturbSETTLE:elev -2.041e-03 2.073e-03
-0.984 0.3255

disturbVIRGIN:elev -4.867e-04 1.065e-03
-0.457 0.6478

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual
standard error: 2.094 on 403 degrees of freedom

Multiple
R-squared: 0.1139, Adjusted
R-squared: 0.09848

F-statistic:
7.398 on 7 and 403 DF,
p-value: 2.249e-08

This
again is the default contrast form of model results, where we see that
elevation is really driving the relationship, with no support for including
disturbance history or its interaction with elevation. The significance tests associated with only
the main effect of disturbance (rather than treatment contrasts is found in the
**anova**-style output:

anova(ancova1)

Analysis of
Variance Table

Response: cover

Df Sum Sq Mean Sq F
value Pr(>F)

disturb
3 29.89 9.964
2.2717 0.07972 .

elev
1 191.92 191.924 43.7589
1.184e-10 ***

disturb:elev
3 5.32 1.775
0.4047 0.74971

Residuals 403 1767.53 4.386

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Which again shows no support for disturbance or
the interaction. We can test the fit of models with and
without the disturbance variance by defining a second ANCOVA model with elevation
only (the period below followed by the minus sign is translated as 'the full
model minus these terms, with the colon signifying the interaction):

ancova2 = update(ancova1, ~.-disturb-disturb:elev)

and then using the useful **anova**
function to compare whether the explained variance is significantly less from
the simpler model:

anova(ancova1,ancova2)

Analysis of
Variance Table

Model 1: cover
~ disturb * elev

Model 2: cover
~ elev

Res.Df RSS Df Sum of
Sq F Pr(>F)

1 403 1767.5

2 409 1802.9 -6 -35.381 1.3445 0.2361

The
models are not significantly different in explanatory power (p=0.2361), so the
simpler one with only elevation (model 2) is better. AIC agrees:

> AIC(ancova1,ancova2)

df AIC

ancova1 9 1783.912

ancova2 3 1780.058

There
is an even easier (but not always correct) way to perform model simplification
(backwards elimination from a maximal model) in R using the **step**
function:

> step(ancova1)

Start: AIC=615.54

cover ~
disturb * elev

Df Sum of Sq
RSS AIC

- disturb:elev 3
5.3248 1772.9 610.78

<none> 1767.5 615.54

Step: AIC=610.78

cover ~
disturb + elev

Df Sum of
Sq RSS AIC

<none> 1772.9 610.78

- disturb 3 30.057 1802.9 611.69

- elev 1
191.924 1964.8 651.03

Call:

lm(formula
= cover ~ disturb + elev, data = dat,
subset = species == "Fagus
grandifolia")

Coefficients:

(Intercept)
disturbLT-SEL disturbSETTLE disturbVIRGIN elev

1.649949
0.335265 0.244698 -0.390918 0.002225

The
output is not as complicated as it seems.
Starting from the maximal model (~disturb*elev),
the function compares the resulting AIC value from taking away the most
complicated (last) term in the model, which here is the interaction between
disturb and elev. With a substantially
better AIC, the function keeps this simpler model and then takes away the terms
from the next tier (here, both main effects).
The model with disturb removed is essentially of equal fit to having it
in (AIC goes up 1 unit), but taking elev out results
in a massive loss of explanatory power (up 41 units). So the best model only includes elev. I do
not know why the AICs reported here do not correspond
to that produced by AIC. Note that **step**
works for forward selection as well as backward, and you can select to do both
simultaneously. (It works with **glm**, too).
Try using **step** with a more complicated ANCOVA, such including the
tci variable along with disturb and elev, along with all interactions. Beware that the saved object associated with
step will include the best-fitting model according to raw AIC (better AIC wins,
no matter whether the actual difference is less than two). For example:

step.ancova = step(ancova1)

summary(step.ancova)

Call:

lm(formula
= cover ~ disturb + elev, data = dat,
subset = species ==

"Fagus grandifolia")

Residuals:

Min
1Q Median 3Q
Max

-4.24965
-1.67849 -0.09873
1.81710 5.37934

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept)
1.6499485 0.4297313 3.839 0.000143 ***

disturbLT-SEL
0.3352653 0.2646654 1.267 0.205970

disturbSETTLE
0.2446982 0.3979475 0.615 0.538965

disturbVIRGIN -0.3909178
0.2702366 -1.447 0.148787

elev
0.0022252 0.0003357 6.630 1.07e-10 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual
standard error: 2.09 on 406 degrees of freedom

Multiple
R-squared: 0.1112, Adjusted
R-squared: 0.1024

F-statistic: 12.7 on 4 and 406 DF, p-value: 9.535e-10

So
disturbance is retained here, even though there is not sufficient AIC support.

**Nonlinear Least
Squares (nls)**

Let's limit the focus of our analysis to one species (say,
sweet birch) using the **subset**
function:

dat2 = subset(dat,dat$species=="Betula
lenta")

Which gives us a data frame of 795 rows (use **str**
to confirm). An example of one relationship that may be nonlinear is that
between elevation and tci (topographic convergence
index).

with(dat2,plot(elev,tci))

Clearly there is more to the relationship than a simple
linear function, but nonetheless let's try fitting an asymptotic model to these
data. The function for this in R using
least squares is **nls**. Unlike **lm**,
minimization is done via an optimization algorithm, which thus requires extra
input on your part in the form of initial values for each parameter in the
optimization. This also means **nls** can
sometimes be frustrating, due to lack of convergence or slow performance with
very large datasets. One two-parameter
asymptotic model is the Michaelis-Menten (MM)
function, used in **nls**
here:

model1 = nls(tci ~ a*elev
/ (b+elev), start=list(a=500,b=5), data=dat2)

The *start* argument
is a list of each fitted parameter and its starting value. Here a is the
asymptote, and b is the x value where y=a/2.
Note usually the MM function is used for monotonically increasing data,
whereas here the trend is downward, so our b value will be negative. Access parameter estimates with the **summary** function:

summary(model1)

Formula: tci ~ a * elev/(b
+ elev)

Parameters:

Estimate Std. Error t value Pr(>|t|)

a 4.9742 0.1564
31.814 < 2e-16 ***

b -109.3603
16.9142 -6.466 1.76e-10 ***

---

Signif. codes: 0
‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual
standard error: 2.255 on 793 degrees of freedom

Number of
iterations to convergence: 8

Achieved
convergence tolerance: 1.309e-06

Indeed the guess about the asymptote (a=5) was right on, and b was fit to -109. The t and associated p values indicate whether the parameter values are significantly different from zero. (This is different from deciding whether or not this is the best model, or even whether these terms both belong in this model.) You also get an indication of the model fit and df, plus information about the optimization result. Let's see what the regression line looks like:

x = seq(0,2000)

lines(x,predict(model1,list(elev=x)),col="red")

Here we use **predict**
to generate new (predicted) values based on our fitted model, using the new
vector x. The **predict** function is generic and should work with any modeling
function.

R provides
an easier way to implement nls for many common
nonlinear functions via 'self-starting functions' that do not require
specification of initial parameter values.
(It is also possible to write your own functions for this; see **selfStart**.) The one for the MM model looks like this:

model2 = nls(tci ~ SSmicmen(elev,a,b), data=dat2)

Fortunately we get basically the same model fit as before:

summary(model2)

Formula: tci ~ SSmicmen(elev, a, b)

Parameters:

Estimate Std. Error t value Pr(>|t|)

a 4.9742 0.1564
31.814 < 2e-16 ***

b -109.3588
16.9144 -6.465 1.76e-10 ***

---

Signif. codes: 0
‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual
standard error: 2.255 on 793 degrees of freedom

Number of
iterations to convergence: 4

Achieved
convergence tolerance: 2.352e-06

Maybe a power function fits the data better?

model3 = nls(tci ~ c*elev^z,
start=list(c=20,z=1), data=dat2)

lines(x,predict(model3,list(elev=x)),col="blue")

The fit is similar. If the functions were nested (i.e,. setting a parameter to zero yields the other function), we could use a likelihood ratio test to compare models with the anova function. Here they're not, so we can appeal to AIC:

AIC(model2,model3)

df AIC

model2 3 3552.757

model3 3 3546.405

There is substantially more support for the power model
(lower AIC). We'll talk more about this
later. For more on useful deterministic
nonlinear functions, see **gnls**
in the nlme library.
Finally, note that for relatively simple nonlinear relationships it is
often more productive to simply transform the dependent variable rather than
worry about nonlinearity.

- log y vs x for exponential relationship

- log y vs log x for power

- exp(y) vs. x
for logarithmic

- 1/y vs 1/x
for asymptotic

- log(p/(1-p)) vs x for proportion data

- sqrt(y) for count data for variance homogenization

- arcsin(y) for percentage data for variance homogenization