April 26: GRASS and R

 

Starting the R environment from within a GRASS session gives you a powerful approach for spatial modeling and is one of the clear advantages for using GRASS over Arc and other GIS software.  In UNIX/Mac, run R from the GRASS shell environment by starting up a GRASS session in the desired LOCATION and MAPSET, then at the GRASS prompt type 'R' to start an R session.  The R package required to interface with GRASS depends on which version of GRASS you're using: for GRASS 5.x, use the "GRASS" library; for GRASS 6.x, use the "spgrass6" library.  For example:
install.packages("spgrass6", dependencies = TRUE)

Although the way R interfaces with GRASS is different with version 6 (supposedly faster), the commands are similar regardless of version.  For Windows users, the situation looks a bit dire, as I can find very little in the way of recent web updating for integrating with R via the new Cygwin changes, and I am unaware of a reasonable integration of winGRASS with R.  (Those wanting to try can explore old GRASS-R posts on the web; please let me know of any successes.)  A general introduction to GRASS-R is here:

http://grass.osgeo.org/statsgrass/grass_geostats.html

 

Preliminaries

            In this example I'm starting GRASS 5.4 from our BIO793_GSMNP2 location and the PERMANENT mapset.  After the R GRASS (or spgrass6) package is installed and R is running from the GRASS environment, load the library and then use the gmeta command to load the current GRASS environment parameters into R:

library(GRASS) #or library(spgrass6)

G = gmeta()

This creates an object of class=grassmeta, which tells R which versions of the generic functions to use (like plot), and it defines the spatial environment.  A summary command specific to GRASS meta-objects shows:
summary(G)

Data from GASS 5.0 LOCATION BIO793_GSMNP2 with 2913 columns and 1348 rows; UTM, zone: 17.

The west-east range is 227845, 315235, and the south-north: 3922854, 3963294;
West-east cell sizes are 30 units, and south-north 30 units.

The str(G) command shows that G is just a list containing spatial information.  Note that all GRASS environment commands are available in R using the system command.  For example, to see a list of available rasters use the g.list command as if in GRASS:
system("g.list rast")

-----------------------------------------------------

raster files available in mapset PERMANENT:

aspect    elev slope     tci

-----------------------------------------------------

To manipulate GRASS rasters in R, the rast.get command converts a raster into a large R list object.  The c(F) argument tells R that you want the raw raster values rather than a categorical dataset (you can also use catlabels=F), and the G argument is needed to specify the GRASS spatial environment:

elev = rast.get(G,"elev",c(F))

str(elev)

List of 1

 $ elev: num [1:3926724] NA NA NA NA NA NA NA NA …

The NAs describe missing cells.  Note that the raster values are a vector nested inside a list, so to access particular values you need to specify both the list name and the vector name.  Note the difference:
length(elev)

[1] 1

length(elev$elev)

[1] 3926724

As one single vector, there is no X or Y information in our R object.  To create an entire data frame with X, Y, and Z (here, elevation) data, we extract the 'east' and 'north' elements of our G object:

elev.frame = data.frame(east(G),north(G),elev$elev)

str(elev.frame)

'data.frame': 3926724 obs. of 3 variables:

 $ east.G.    : num     227860 227890 227920 227950 227980 …

 $ north.G.   : num     3963279 3963279 3963279 3963279 3963279 …

 $ elev.elev  : num     NA NA NA NA NA NA NA NA NA ...

The naming conventions are a bit silly, so change via:
names(elev.frame) = c("x","y","elev")

Depending on the objectives of the analysis, it may make sense to create one data frame for the region that contains all rasters (although beware of creating very large objects).  You can grab multiple rasters with rast.get and then put them each as a separate column in a data frame:
rast.list = rast.get(G, rlist=c("elev","aspect","slope","tci","rad171"), catlabels=as.logical(rep(F,5)))

rf = data.frame(east(G),north(G),rast.list$elev,rast.list$aspect,rast.list$slope,rast.list$tci,rast.list$rad171)

names(rf) = c("x","y","elev","aspect","slope","tci","rad")

str(rf)

Now you can access all raster variables as you normally do those in data frames.  For example, compare elevation summary statistics with GRASS output:
summary(rf$elev)

system("r.univar elev")

The generic plotting function for GRASS objects is a 3D image, where you specify the region (G) and the R vector for a particular raster:
plot(G,rf$elev,col=terrain.colors(30))

R has functions for specifying certain color palettes: the above is for basic terrain (the argument is for the number of categories to display) and you can also try topo.colors(), heat.colors(), and rainbow().  There is a built-in function for making contour lines for raster in GRASS-R 5 called contour.G, which we can superimpose our on image:

contour.G(G,rf$elev,add=T)

 

Spatial analysis

            One you have a data frame of X, Y, and various 'Z' coordinates, spatial analysis via a variety of R packages is straightforward.  Given the number of calculations involved (for example, using pairwise comparisons of millions of pixels), you often want to take randomly sampled rows of the spatial data frame rather than the entire set, and it's generally a good idea to remove missing values before the analysis.  A polynomial trend surface fitted by least squares (or generalized least squares using surf.gls) can be fit to a raster using the surf.ls function in the spatial library, and displayed with a GRASS 5 wrapper called trmat.G:

library(spatial)

xyz = na.omit(rf[,1:3]) #remove missing values of elevation

xyz2 = xyz[sample(1:2292309),10000),] #10,000 random points (not NA)

names(xyz2) = c("x","y","z")

trend = surf.ls(4,xyz2) #4th-order polynomial

plot(G,rf$elev)

contour.G(G,trmat.G(G,trend),add=T)   #superpose contour lines of trend model on raster

The spatial package includes a function for creating variograms that takes the fitted trend-surface as input, along with desired number of distance bins.  correlogram is the equivalent function for distance-based covariances.
variogram(trend,nint=30)

 

Mapping predictions from a species distribution model

            Predictions from a fitted species distribution model can be easily mapped in the GRASS-R interface using the predict function for any model building exercise and a data frame of raster variables that were used in the model.  Here's an example where we return to our Smokies tree dataset and are interested in examining the probability of Fraser magnolia occurrence across the entire Park, based on elevation, moisture (tci), and site radiation that we calculated for June 20.  Make sure that the treedata.csv file is in the working directory.  I additionally need to import the radiation raster (rad171.txt):

system("r.in.arc input=rad171.txt output=rad171")

rad171 = rast.get(G,"rad171",c(F))

rf$rad = rad171$rad171

Let's create the dataset for magnolia:

dat = read.csv("treedata.csv")    

dat2 = subset(dat,dat$species=="Magnolia fraseri")  

b.plots = unique(as.character(dat2$plotID)) #plots with magnolia

u.plots = unique(as.character(dat$plotID)) #all plots

nob.plots = u.plots[is.element(u.plots,b.plots)==F] #plots without magnolia

dat3 = subset(dat,is.element(as.character(dat$plotID),nob.plots)) #dataset of no magnolia

dat4 = subset(dat3,duplicated(dat3$plotID)==F)  #one row per plot

dat4$cover = 0     #cover of magnolia is zero in these plots

dat5 = rbind(dat2,dat4) #new dataframe of presences and absences

We need to extract values of our new variable rad corresponding to presences and absences from our existing raster. We can use the utme and utmn coordinates to subset the raster values in R, but note they don't exactly equal our X and Y values.  An easier option is to route it through GRASS using the r.what command:
frame1 = data.frame(dat5$utme,dat5$utmn)   #data frame of only x and y coordinates

write.table(frame1,file="magout.txt",sep="\t",row.names=F,col.names=F)     #write a text file to working directory for r.what to access

system("r.what input=rad171 <magout.txt >output1.txt")    #create a new text file with appended rad171 values

rad1 = read.table("output1.txt",sep="|")   #read new text file

dat5$rad = as.numeric(as.character(rad1[,4]))        #final column of text file is appended rad171 value; beware it is imported as a factor rather than numeric values

            Now we construct a GAM model with binomial error:

library(mgcv)

gam1 = gam(cover>0~s(elev)+s(tci)+s(rad),family=binomial,data=dat5,na.action=na.exclude)

summary(gam1)

Using the predict function we can insert a another column into our raster data frame:
rf$predict1 = as.numeric(predict(gam1,newdata=list(elev=rf$elev,tci=rf$tci,rad=rf$rad),type="response"))

Each pixel in the new raster predict1 is now a value between 0 and 1 representing our predictions about magnolia presence:

hist(rf$predict1)

plot(G,rf$predict1,col=heat.colors(30))

We can save our prediction map using the rast.put function, for further display elsewhere (such as in GRASS or exported into Arc via r.out.arc):
rast.put(G,lname=gampred,layer=rf$predict1,DCELL=T)  #DCELL=T keeps 2-decimal precision

Display of the map is faster in native GRASS, which we can again access with the system command. The colors of the map in GRASS can be manipulated with the r.colors command.

system("g.list rast")

system("r.colors map=gampred color=gyr")

system("d.mon start=x1")

system("d.rast gampred")
system("d.legend gampred")