Bio793 notes, Feb 2

A few more basics from last time

Let’s input the tree dataset again:

We can explore properties of individual variables graphically. For example, the distribution of elevation (elev) can be assessed with a histogram:

> hist(dat\$elev)

with control over the number of bins via the breaks argument:

> hist(dat\$elev,breaks=10)

We can compare this to an estimate of the continuous probability density by overlaying the density on a histogram scaled to proportions rather than frequency:

> hist(dat\$elev,breaks=10,freq=F)

> lines(density(dat\$elev),lwd=2,col="red")

Some elevation values seem to be over-represented, leading to a choppy distribution. Perhaps this is because some plots were sampled multiple times? To get a single elevation value per plot, we can take out plot duplicates using the duplicated function:

> dat2 = dat[duplicated(dat\$plotID)==F,]

> plot(density(dat2\$elev),lwd=2,col="salmon")

This is clearly better, although there still a hump around 1400 m. Note how duplicated returns "false" if an entry is NOT duplicated, so we changed those not duplicated by converting all T to F and vice versa with "==F".

Is elevation normally distributed?

> qqnorm(dat2\$elev)

> qqline(dat2\$elev,col="purple",lwd=2,lty=2)

The 'qq' plot (quantile-quantile against a normal distribution) for a normal distribution should lie on the qqline we plotted, but the tails are way off.  We can test for this with a Shapiro-Wilk normality test (see Crawley p. 282):

> shapiro.test(dat2\$elev)

Shapiro-Wilk normality test

data:  dat2\$elev

W = 0.962, p-value = 7.174e-15

The infinitesimal p value suggests our elevation distribution is anything but normal.  Here's what perfect normality looks like:

> curve(dnorm)

This curve function works for any formula or distribution object in R.  For example:

> curve(x+x^2+x^3)

Two other handy plotting functions are the pairs function, which plots all variables of a data frame against one another in multi-panel matrix format, and coplot, which shows bivariate relationships conditioned on a third variable.

Writing functions

Much of the power of R involves writing your own functions to perform complex calculations and simplify repetitive tasks. Writing a function just requires assigning a name to the function function, along with desired function parameter names (and default values, if desired) and a specification of what the function is supposed to do. To call the function, just type its name with the necessary parameter values in parentheses. Here's an example for calculating the standard error (se) of a vector:

> se = function(x) {sqrt(var(x)/length(x))

Now that the function 'se' is defined, we can use it to calculate the standard error of a variable like elevation:

> se(dat2\$elev)

 12.78305

Note we told to the function to include only one argument (x); we don't have to specify that dat\$elev is actually x because this is the only argument to se. You also don't have to specify which argument is which if they are listed in the same order as you when you created the function.

You can pass arguments to other (nested) functions. For example we can write our own function for variance within the se function:

> se2 = function (x) {

variance = function(X=x) {sum((X-mean(X))^2)/(length(X)-1)}

return ( sqrt(variance(x)/length(x)) )

}

> se2(dat2\$elev)

 12.78305

Here our vector input x is passed onto the 'variance' function with X=x.  It is generally recommended to put new statements on new lines within functions.  R knows the function is still open because the original bracket '{' has not yet been closed ('}'). The use of return tells the function which value to end with if there are many; otherwise it will return last one evaluated.

Here's another example of writing a function to return a bootstrapped confidence interval for our vector x, which demonstrates the handy function sample.

> CI.boot = function(x,nperm,lower,upper) {

list1 = vector("list",nperm)

vec1 = sapply(list1,function(y)mean(sample(x,replace=T)))

return ( quantile(vec1,c(lower,upper)) )

}

> CI.boot(c(1:100),10000,.025,.975)

2.5% 97.5%

44.75 56.20

Here we created the function 'CI.boot' with 4 arguments: the vector for which the CI is to be calculated, the number of desired permutations, and the desired lower and upper confidence quantiles (e.g., 2.5%, 97.5%).  The next two lines are tricky: the first creates a list object, with 'nperm' empty elements; the second then populates each element in the list with a sampled (with replacement), random version of our initial vector x, takes the mean of each permutation, and the returns these mean values in a simple vector form.  So we end up with a population of sampled means (vec1), and we then ask for the desired lower and upper quantiles of this population (the quantile function, good for summarizing vectors generally).  Note with sample you can specify different probabilities of selecting each element, and can sample with and without replacement.

Using tapply and other apply functions

The CI.boot function also contained another bit of magic about R: vector-based calculations.  In one line of code we created 10,000 different randomized versions of the vector x, without using a for loop, and all in a few seconds.  This is usually accomplished with one of the apply family of functions, of which tapply will be used most commonly.  To understand what apply (or sapply) does, note that we first created an empty object (in this case a list of 'nperm' elements), and then performed the same function on each element.  Rather than do this sequentially with a for loop, the sapply function applied the 'mean(sample…)' function simultaneously to all elements.  It may be easiest to see this with apply function on a matrix.

> matrix1 = matrix(seq(1:100),nrow=10,ncol=10)

> apply(matrix1,2,mean)

  5.5 15.5 25.5 35.5 45.5 55.5 65.5 75.5 85.5 95.5

Here we created a 10x10 matrix populated with numbers 1 to 100, filled down columns first so that each subsequent column gets higher numbers (check it by calling 'matrix1').  Then we used apply to calculate the mean of each column, which was returned in vector form.  The second argument in apply refers to summarizing by columns (the 2nd dimension) rather than rows, and the final argument is the function we wish to apply to each column.  This last part is where we can use apply to do complicated calculations.  For example, we can use our own se function instead of the mean:

> apply(matrix1,2,function(x)se(x))

 0.9574271 0.9574271 0.9574271 0.9574271 0.9574271 0.9574271 0.9574271 0.9574271 0.9574271 0.9574271

The standard errors of each column are identical.  The 'function(x)…' argument means that whatever comes next is a function that uses 'x', where 'x' stands in for each successive column.  You can put here anything that will return a value, including nested functions, calls to other functions, many lines of code (in brackets), etc.

Finally, we can use tapply to apply functions to vectors in a data frame, separated by other grouping variables.  For example, let's look at the mean elevation for vegetation plots in each of the disturbance categories:

> dist.mean = tapply(dat2\$elev,dat2\$disturb,mean)

> dist.mean

CORPLOG    LT-SEL    SETTLE    VIRGIN

1109.0728  839.9964  672.8473 1355.6132

And the standard errors:

> dist.se = tapply(dat2\$elev,dat2\$disturb,function(x)se(x))

> dist.se

CORPLOG   LT-SEL   SETTLE   VIRGIN

24.14951 16.37239 17.20755 19.94275

And finally a bar graph with error bars:

> bar1 = barplot(dist.mean,names.arg=names(dist.mean),ylim=c(0,1500),ylab="Elevation")

> arrows(bar1,dist.mean,bar1,dist.mean+dist.se,angle=90,lwd=2)

Functions sapply and lapply are similar to tapply, but use elements of a list (or vector) rather than grouped variables or matrix dimensions.

Creating models: where to from here

- basic strategy for the way I'm presenting modeling techniques:

- everything before spring break (basically) concerns 'single-level' models: that is, only one type of variability

- as single-level models get complicated, we're modeling how the mean changes: the error distribution (residuals) will not vary among samples

- in multi-level models, we model changes in the distribution of variance (error) in addition to the mean

- these get analytically intractable, so we use likelihood techniques with optimization

- for now, we're only concerned with modeling the correct error distribution and specifying appropriate process models

- later, we'll see how all model components can themselves be modeled with their own stochastic distributions

General linear models (least squares) in R

Bolker (Chap. 9) provides a nice summary of general linear models as opposed to the more modern techniques.  Essentially general linear models (NOT generalized linear models) are the old-school models of normal residual distributions, independent observations, homoscedasticity, and (assumed) lack of any observation error.  In R this includes everything that the lm function does (simple and multiple least-squares regression), ANOVA, and ANCOVA.  The only difference between these is whether the model includes only continuous variables (regression), only factor variables (ANOVA), or both (ANCOVA).

All modeling approaches in R use the same basic structure of "response ~ explanatory variable(s)".  Here's an example of a regression model of the abundance (cover) of beech on elevation:

> lm1 = lm(cover~elev,data=dat,subset=species=="Fagus grandifolia")

This simple linear regression fits both an intercept and a slope.  A summary of the fitted model object is available with the summary function:

> summary(lm1)

Call:

lm(formula = cover ~ elev, data = dat, subset = species == "Fagus grandifolia")

Residuals:

Min      1Q  Median      3Q     Max

-3.9054 -1.8398 -0.1664  1.6433  5.4512

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 1.9620373  0.3437750   5.707 2.21e-08 ***

elev        0.0019596  0.0002971   6.596 1.31e-10 ***

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.1 on 409 degrees of freedom

Multiple R-squared: 0.09613,    Adjusted R-squared: 0.09393

F-statistic:  43.5 on 1 and 409 DF,  p-value: 1.313e-10

All fitted model objects can be explored with summary.  For lm, you get a restatement of the model, the range and quantile estimates of the model residuals, a table of fitted coefficients along with their precision (SE) and significance (t tests), the residual SE (square root of the error variance), the R2 or coefficient of determination (fraction of variance explained), the adjusted-R2 (applies a penalty for more fitted parameters), and the associated F statistic evaluating the significance of the full model.  There are many attributes of lm1, most of which are not shown.  For example, you can retrieve a vector of fitted (predicted) values with lm1\$fitted (or fitted(lm1)), a vector of residuals with lm1\$resid (or resid(lm1)), and return the coefficients with coef.  Alternatively, note that this summary is itself an object, with components that can be accessed with indexing.  For example:

> summary(lm1)[]

Estimate   Std. Error  t value     Pr(>|t|)

(Intercept) 1.962037256 0.3437749892 5.707330 2.206867e-08

elev        0.001959647 0.0002971168 6.595544 1.312590e-10

returns the 4th element of summary(lm1), which is stored as a list.  Here this element is the coefficient table.  We could also show these results as an ANOVA-type table:

> summary.aov(lm1)

Df  Sum Sq Mean Sq F value    Pr(>F)

elev          1  191.76  191.76  43.501 1.313e-10 ***

Residuals   409 1802.91    4.41

In fact, all lm fitted objects can be returned in both summary forms (summary.lm and summary.aov).  Other helpful related items are a report of the confidence intervals of each parameter:

> confint(lm1)

2.5 %      97.5 %

(Intercept) 1.28625089 2.637823622

elev        0.00137558 0.002543713

and associated AIC:

> AIC(lm1)

 1780.058

Don't forget that AIC values are particular to a given response vector: a change in any of the values of the response (e.g., transformation, removing an outlier, etc.) results in a non-comparable vector.

Is our model a good model?  One easy way to examine model results is with the plot function:

> plot(lm1)

Here you get four graphs (click to go from one panel to the next): 1) residuals vs. fitted values (a good model will show no pattern); 2) the qqnorm plot we saw above (values should be on the dashed line); 3) scale-location graph, indicating heteroscedasticity ; and 4) standardized residuals vs. leverage and Cook's distance, which is handy for identifying outliers.  The first thing we notice in our data is that the y data are not continuous (thus the cross-hatched patterns in residual graphs) and that the residuals are probably not normally distributed (try shapiro.test(lm1\$resid) to confirm).  Would square root transformation help?  We could compare the fit of a new model (lm2) with a transformed cover variable, by either calling lm again with new arguments or using the update function:

> lm2 = update(lm1,formula=sqrt(cover)~elev)

> plot(lm2)

The residuals don't look any better and the R2 has gone down.  To do better we'll have to move to a generalized linear model (glm).

Now we'll try an ANOVA.  Let's ask whether beech is differentially sensitive to disturbance history:

> aov1 = lm(cover~disturb,data=dat,subset=species=="Fagus grandifolia")

> summary(aov1)

Call:

lm(formula = cover ~ disturb, data = dat, subset = species ==

"Fagus grandifolia")

Residuals:

Min     1Q Median     3Q    Max

-3.333 -2.053 -0.229  1.947  5.771

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept)     4.2290     0.1920  22.030   <2e-16 ***

disturbLT-SEL   0.1043     0.2759   0.378   0.7055

disturbSETTLE  -0.8199     0.3828  -2.142   0.0328 *

disturbVIRGIN  -0.1759     0.2821  -0.624   0.5332

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.197 on 407 degrees of freedom

Multiple R-squared: 0.01499,    Adjusted R-squared: 0.007725

F-statistic: 2.064 on 3 and 407 DF,  p-value: 0.1044

This output is more confusing than regression output; for one thing coefficients are only shown for 3 of four levels of disturbance, plus the overall intercept.  This is because the default contrast structure for ANOVA in R is to use 'treatment contrasts' and ignore a raw intercept.  The factor level that is alphabetically first (here, "disturbCORPLOG") is used as the overall intercept, so this estimate is used to test whether the overall mean differs from zero (here, it clearly does, because there are no zero cover values in our dataset).  Each remaining factor level is then tested against this first level; i.e., the second parameter estimate (here next to "disturbLT-SEL") is the difference in cover of beech in CORPLOG and LT-SEL plots.  So what this output is showing us is that SETTLE plots have significantly lower abundances of beech than CORPLOG plots (by -.8 cover units).  The first row coefficient is an actual mean of a sample (mean beech cover of all CORPLOG plots), while the remaining coefficients are differences in mean cover, not actual means themselves.  These are of course only a few of all possible contrasts; you can specify a particular contrast structure with the function contrasts.

We could optionally choose to have R not fit an intercept at all, but instead determine whether the mean cover of each group is significantly different from zero.  In our case this is a bit silly, but it is easily accomplished using '-1' in on right side of the equation:

> aov2 = lm(cover~disturb-1,data=dat,subset=species=="Fagus grandifolia")

A call to summary(aov2) will confirm that indeed cover is greater than zero for all groups.  Not fitting an intercept is usually more handy in regressions when we want the line to go through the origin.

Finally, note that the lm output for our model is different than that produced by aov:

> summary.aov(aov1)

Df  Sum Sq Mean Sq F value Pr(>F)

disturb       3   29.89  9.9635  2.0639 0.1044

Residuals   407 1964.78  4.8275

You can see why from the summary table: the lm output produces a set of treatment contrasts (the mean of each group), each tested with pairwise t tests.  But here aov produces the overall ANOVA result, which is a simple comparison of variances of the cover response variable between and within the grouping variable; comparisons of two variances are done via the F test.  The fact that these tests give us different results suggests than in fact the difference between groups is an issue of variance as well as the mean:

> plot(dat\$disturb[dat\$species=="Fagus grandifolia"],dat\$cover[dat\$species=="Fagus grandifolia"],notch=T)

The problem is evident here: a general linear model is not a good choice because of heteroscedasticity (SETTLE variance is much lower).  This would be a good occasion to use a bootstrapped model (like a permutational ANOVA) rather than something based on classical assumptions.

Multiple comparisons can be accomplished with various other tests in R, such as the popular Tukey honest significant difference (HSD) test.  The argument for the TukeyHSD function must be an aov object rather than a lm one:

> TukeyHSD(aov(aov1))

Tukey multiple comparisons of means

95% family-wise confidence level

Fit: aov(formula = aov1)

\$disturb

LT-SEL-CORPLOG  0.1043257 -0.6073034 0.81595478 0.9815546

SETTLE-CORPLOG -0.8199167 -1.8075190 0.16768552 0.1417827

VIRGIN-CORPLOG -0.1759103 -0.9035981 0.55177748 0.9244785

SETTLE-LT-SEL  -0.9242424 -1.9198871 0.07140225 0.0796909

VIRGIN-LT-SEL  -0.2802360 -1.0188019 0.45832992 0.7616165

VIRGIN-SETTLE   0.6440064 -0.3631787 1.65119155 0.3521145

Unlike the t tests produced by the lm summary, the more conservative Tukey HSD tests suggest that beech cover does not vary between any two disturbance classes.

We have already noted that plots of different disturbance classes have a very different elevation distribution.  We can determine whether elevation influences the above ANOVA results (or equivalently, whether the elevation effect varies by disturbance class) with an ANCOVA:

> ancova1 = lm(cover~disturb*elev,data=dat,subset=species=="Fagus grandifolia")

> summary(ancova1)

Call:

lm(formula = cover ~ disturb * elev, data = dat, subset = species ==

"Fagus grandifolia")

Residuals:

Min      1Q  Median      3Q     Max

-4.3054 -1.5585 -0.1539  1.7687  5.3592

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept)         1.517e+00  6.635e-01   2.287   0.0227 *

disturbLT-SEL       3.248e-01  8.653e-01   0.375   0.7076

disturbSETTLE       1.688e+00  1.546e+00   1.092   0.2754

disturbVIRGIN       2.092e-01  1.338e+00   0.156   0.8758

elev                2.340e-03  5.503e-04   4.252 2.64e-05 ***

disturbLT-SEL:elev  2.112e-05  7.402e-04   0.029   0.9772

disturbSETTLE:elev -2.041e-03  2.073e-03  -0.984   0.3255

disturbVIRGIN:elev -4.867e-04  1.065e-03  -0.457   0.6478

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.094 on 403 degrees of freedom

Multiple R-squared: 0.1139,     Adjusted R-squared: 0.09848

F-statistic: 7.398 on 7 and 403 DF,  p-value: 2.249e-08

This again is the default contrast form of model results, where we see that elevation is really driving the relationship, with no support for including disturbance history or its interaction with elevation.  The significance tests associated with only the main effect of disturbance (rather than treatment contrasts is found in the anova-style output:

> anova(ancova1)

Analysis of Variance Table

Response: cover

Df  Sum Sq Mean Sq F value    Pr(>F)

disturb        3   29.89   9.964  2.2717   0.07972 .

elev           1  191.92 191.924 43.7589 1.184e-10 ***

disturb:elev   3    5.32   1.775  0.4047   0.74971

Residuals    403 1767.53   4.386

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Which again shows no support for disturbance or the interaction.  We can test the fit of models with and without the disturbance variance by defining a second ANCOVA model with elevation only (the period below followed by the minus sign is translated as 'the full model minus these terms, with the colon signifying the interaction):

> ancova2 = update(ancova1, ~.-disturb-disturb:elev)

and then using the useful anova function to compare whether the explained variance is significantly less from the simpler model:

> anova(ancova1,ancova2)

Analysis of Variance Table

Model 1: cover ~ disturb * elev

Model 2: cover ~ elev

Res.Df    RSS Df Sum of Sq      F Pr(>F)

1    403 1767.5

2    409 1802.9 -6   -35.381 1.3445 0.2361

The models are not significantly different in explanatory power (p=0.2361), so the simpler one with only elevation (model 2) is better. AIC agrees:

> AIC(ancova1,ancova2)

df      AIC

ancova1  9 1783.912

ancova2  3 1780.058

There is an even easier (but not always correct) way to perform model simplification (backwards elimination from a maximal model) in R using the step function:

> step(ancova1)

Start:  AIC=615.54

cover ~ disturb * elev

Df Sum of Sq    RSS    AIC

- disturb:elev  3    5.3248 1772.9 610.78

<none>                      1767.5 615.54

Step:  AIC=610.78

cover ~ disturb + elev

Df Sum of Sq    RSS    AIC

<none>                 1772.9 610.78

- disturb  3    30.057 1802.9 611.69

- elev     1   191.924 1964.8 651.03

Call:

lm(formula = cover ~ disturb + elev, data = dat, subset = species ==     "Fagus grandifolia")

Coefficients:

(Intercept)  disturbLT-SEL  disturbSETTLE  disturbVIRGIN           elev

1.649949       0.335265       0.244698      -0.390918       0.002225

The output is not as complicated as it seems.  Starting from the maximal model (~disturb*elev), the function compares the resulting AIC value from taking away the most complicated (last) term in the model, which here is the interaction between disturb and elev.  With a substantially better AIC, the function keeps this simpler model and then takes away the terms from the next tier (here, both main effects).  The model with disturb removed is essentially of equal fit to having it in (AIC goes up 1 unit), but taking elev out results in a massive loss of explanatory power (up 41 units).  So the best model only includes elev. I do not know why the AICs reported here do not correspond to that produced by AIC.  Note that step works for forward selection as well as backward, and you can select to do both simultaneously.  (It works with glm, too).  Try using step with a more complicated ANCOVA, such including the tci variable along with disturb and elev, along with all interactions.  Beware that the saved object associated with step will include the best-fitting model according to raw AIC (better AIC wins, no matter whether the actual difference is less than two).  For example:

> step.ancova = step(ancova1)

> summary(step.ancova)

Call:

lm(formula = cover ~ disturb + elev, data = dat, subset = species ==

"Fagus grandifolia")

Residuals:

Min       1Q   Median       3Q      Max

-4.24965 -1.67849 -0.09873  1.81710  5.37934

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept)    1.6499485  0.4297313   3.839 0.000143 ***

disturbLT-SEL  0.3352653  0.2646654   1.267 0.205970

disturbSETTLE  0.2446982  0.3979475   0.615 0.538965

disturbVIRGIN -0.3909178  0.2702366  -1.447 0.148787

elev           0.0022252  0.0003357   6.630 1.07e-10 ***

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.09 on 406 degrees of freedom

Multiple R-squared: 0.1112,     Adjusted R-squared: 0.1024

F-statistic:  12.7 on 4 and 406 DF,  p-value: 9.535e-10

So disturbance is retained here, even though there is not sufficient AIC support.

Nonlinear Least Squares (nls)

Let's limit the focus of our analysis to one species (say, sweet birch) using the subset function:

> dat2 = subset(dat,dat\$species=="Betula lenta")

Which gives us a data frame of 795 rows (use str to confirm). An example of one relationship that may be nonlinear is that between elevation and tci (topographic convergence index).

> with(dat2,plot(elev,tci))

Clearly there is more to the relationship than a simple linear function, but nonetheless let's try fitting an asymptotic model to these data.  The function for this in R using least squares is nls.  Unlike lm, minimization is done via an optimization algorithm, which thus requires extra input on your part in the form of initial values for each parameter in the optimization.  This also means nls can sometimes be frustrating, due to lack of convergence or slow performance with very large datasets.  One two-parameter asymptotic model is the Michaelis-Menten (MM) function, used in nls here:

> model1 = nls(tci ~ a*elev / (b+elev), start=list(a=500,b=5), data=dat2)

The start argument is a list of each fitted parameter and its starting value.  Here a is the asymptote, and b is the x value where y=a/2.  Note usually the MM function is used for monotonically increasing data, whereas here the trend is downward, so our b value will be negative.  Access parameter estimates with the summary function:

> summary(model1)

Formula: tci ~ a * elev/(b + elev)

Parameters:

Estimate Std. Error t value Pr(>|t|)

a    4.9742     0.1564  31.814  < 2e-16 ***

b -109.3603    16.9142  -6.466 1.76e-10 ***

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.255 on 793 degrees of freedom

Number of iterations to convergence: 8

Achieved convergence tolerance: 1.309e-06

Indeed the guess about the asymptote (a=5) was right on, and b was fit to -109.  The t and associated p values indicate whether the parameter values are significantly different from zero.  (This is different from deciding whether or not this is the best model, or even whether these terms both belong in this model.)  You also get an indication of the model fit and df, plus information about the optimization result.  Let's see what the regression line looks like:

> x = seq(0,2000)

> lines(x,predict(model1,list(elev=x)),col="red")

Here we use predict to generate new (predicted) values based on our fitted model, using the new vector x.  The predict function is generic and should work with any modeling function.

R provides an easier way to implement nls for many common nonlinear functions via 'self-starting functions' that do not require specification of initial parameter values.  (It is also possible to write your own functions for this; see selfStart.)  The one for the MM model looks like this:

> model2 = nls(tci ~ SSmicmen(elev,a,b), data=dat2)

Fortunately we get basically the same model fit as before:

> summary(model2)

Formula: tci ~ SSmicmen(elev, a, b)

Parameters:

Estimate Std. Error t value Pr(>|t|)

a    4.9742     0.1564  31.814  < 2e-16 ***

b -109.3588    16.9144  -6.465 1.76e-10 ***

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.255 on 793 degrees of freedom

Number of iterations to convergence: 4

Achieved convergence tolerance: 2.352e-06

Maybe a power function fits the data better?

> model3 = nls(tci ~ c*elev^z, start=list(c=20,z=1), data=dat2)

> lines(x,predict(model3,list(elev=x)),col="blue")

The fit is similar.  If the functions were nested (i.e,. setting a parameter to zero yields the other function), we could use a likelihood ratio test to compare models with the anova function.  Here they're not, so we can appeal to AIC:

> AIC(model2,model3)

df      AIC

model2  3 3552.757

model3  3 3546.405

There is substantially more support for the power model (lower AIC).  We'll talk more about this later.  For more on useful deterministic nonlinear functions, see Crawley p. 662 or Bolker p. 88.  You can incorporate grouping (factor) variables with nonlinear least squares using other functions like gnls in the nlme library.  Finally, note that for relatively simple nonlinear relationships it is often more productive to simply transform the dependent variable rather than worry about nonlinearity.  Crawley (p. 205) lists these common transformations:

- log y vs x for exponential relationship

- log y vs log x for power

- exp(y) vs. x for logarithmic

- 1/y vs 1/x for asymptotic

- log(p/(1-p)) vs x for proportion data

- sqrt(y) for count data for variance homogenization

- arcsin(y) for percentage data for variance homogenization