April 19: GRASS GIS

 

The movement away from proprietary GIS packages like Arc that are convenient but expensive and toward an open-source program like GRASS has not quite achieved the level of SAS/SPLUS/STATA/etc to R, but the analogy holds: there is a large and growing community of users that are continually providing new packages in a free software product that is harder to learn (at first) but ultimately more powerful and satisfying.  And the best bit is that GRASS is now designed to integrate (relatively) smoothly with R (R runs within GRASS), allowing powerful statistical analysis of spatial data and basically one-stop shopping for species distribution modeling.  We'll just cover the basics of starting with GRASS and interfacing with R, but I recommended these resources for continuing on:

Neteler, M. and Mitasova, H. 2008. Open source GIS: a grass GIS approach (3rd Ed.). Springer.

Hengl, T. 2009. A practical guide to geostatistical mapping. Open Access (FREE!):

http://blog.revolution-computing.com/2010/04/a-free-book-on-geostatistical-mapping-with-r.html

 

Installing GRASS GIS

As of this writing, you have 3 basic options for working with GRASS: 1) you can run GRASS in its native UNIX environment on a Linux box or Mac (recommended; find a linux box and run everything via a remote terminal is the best option, thereby not worrying about local computer performance); 2) you can be a guinea pig for the *new* native Windows GRASS, available here:

http://grass.itc.it/grass64/binary/mswindows/native/

which does not require a unix emulator but is still in beta stage and has no easy integration with R; or 3) install the Cygwin linux-like environment for Windows and run GRASS from Cygwin, which used to be a decent alternative to the UNIX version but recent Cygwin changes have made installation and particularly R integration difficult.  As much as I wanted to make things easy using a Windows install, I have had so many problems this week working with winGRASS and the new Cygwin changes to GRASS that I'm instead going to show you the Linux version, which at this point seems to be the only option for smooth R integration.  For intrepid Windows junkies I post how I proceeded with the latest Cygwin interface in the box below.  Mac and Linux users can download GRASS binaries here:

http://grass.osgeo.org/download/software.php

and Linux and skip the box below.

 

---------------------------------------------------------------------------------------------------------------------------------------------------------------

GRASS on Windows via Cygwin

            The page supposedly describing GRASS 6.3 installation via Cygwin is here: http://grass.osgeo.org/grass63/binary/mswindows/cygwin/#prereq
            What they don't tell you on that page is that the default package installer for GRASS does not get all the components you need for visualization (that is, the Windows X server that allows you to open new frames for map display).  So I recommend you start here:

http://x.cygwin.com/docs/ug/setup-cygwin-x-installing.html

Follow these instructions carefully, in particular the X packages listed in step 15.  During the download and installation process you may get notices that the download could not be completed: choose another download mirror site and try it again.  The install process remembers what you've already successfully downloaded, so it doesn't take as long in repeated attempts (although repeated cycles through the process is a bit frustrating).  At the last step, choose to create the icons and then fire up Cygwin from the desktop icon.

            You now have an environment that simulates a bash shell UNIX environment.  All the files that Cygwin can 'see' are embedded in your home directory under C:\cygwin.  To start the X windows server, type 'startxwin' at the bash prompt.  A new white background dialogue box appears that will run GRASS.  However, you don't have it yet.  Put the 'setup.exe' installer you downloaded at the beginning in the C:\ directory (if it's not there already), and from the DOS command line (Start-Run: Cmd), type "setup.exe –X".  The Cygwin installer starts again, and this time at step #3 add this location to the list of sites:
http://grass.ibiblio.org/grass63/binary/mswindows/cygwin/

and select it (just it, no other mirror necessary).  Now under the 'Database' package folder, select "grass: Geographic…", and click Next until the installer finishes.  You now hopefully have grass63 installed (or possibly grass62; some mirrors have different versions available).  From your X window, type "grass63" (or grass62), and now we're ready to get started, you can proceed with defining a LOCATION AND MAPSET below.

-----------------------------------------------------------------------------------------------------------------------------------------------------------------

 

Making the Arc-GRASS transition

            I will assume you already have at least a little GIS know-how and have worked with maps in Arc (not essential if you haven't), and are in the position of having an existing raster dataset in Arc that defines your region of interest (e.g., a DEM).  You can port your own raster from Arc to GRASS by saving the Arc raster as a ESRI ASCII grid file (same input for Maxent).  (In ArcToolbox: Conversion Tools -> From Raster -> Raster to ASCII).  This should be a .txt file.  The file is put in the GRASS data directory (see below for Linux; in Cygwin this is created in your home directory (mine is C:\cygwin\home\Fridley\grassdata).  Here we'll use the elev.asc file from the Maxent session, renamed to "elev.txt". 

There are other tips for importing Arc vector data, etc here:

http://grass.osgeo.org/wiki/Tips_for_Arc_users

 

Defining a LOCATION and MAPSET

            Because of its size and formatting complexity, working with GIS data is always a hassle, but the advantage of GRASS is that the hassle is mostly front-loaded: it takes some time to start work in a new region and upload new files, but once they're in the projection and formatting-related problems disappear.  (Anyone who works in Arc knows that the Arc situation is the opposite: easy to get files in, sometimes impossible to get them to talk to each other in an analysis.)  Once you get the hang of it creating new projects is easy, but you do need to know a few basic parameters of your region of interest to get started.

            Whether you're in Linux or Cygwin or winGRASS, GRASS projects have a nested directory structure.  All objects are stored in a central database directory (e.g., /home/user/grassdata), which is where you put maps to be imported (here, our elev.txt file).  Nested within this directory are separate directories for different projects called LOCATIONs.  A Location is both a folder and a geographical region defined by standard spatial properties of spatial extent, resolution, measurement unit (e.g., UTMs), and projection/datum, all of which must be defined when a new Location is created.  A MAPSET is a directory of maps for a particular Location.  Mapsets are most useful in a multi-user environment, where a central set of read-only maps (often called PERMANENT) can be copied and manipulation by individual users, each with their own Mapset folder (/user1, /user2, etc).  I typically create one Mapset called PERMANENT and don’t worry about creating others.  Finally, within the Mapset folder are separate folders for different data types (rasters, vectors, volume maps), which fortunately you don't have to worry about because these are all visible to in a standard GRASS session.  This diagram from the GRASS Quickstart Wiki (http://grass.osgeo.org/grass64/manuals/html64_user/helptext.html) summarizes how GRASS projects are organized:

 

Creating a new Location and Mapset for your region of interest is straightforward once you have basic spatial information for it.  Here is an example working from the command line of GRASS 5.4 in Linux (instructions for the GUI interface for winGRASS are in the above link).

Start grass:

$ grass54

The define location screen opens (this is always the first screen).  Type in a new location name:

LOCATION: BIO793_GSMNP

Keep the Mapset as PERMANENT, and indicate ('y') that you would indeed to create this Location, and give it a one-line description.  For the Smokies most geocoordinate data are in UTMs (option C).  For datum, specify nad27 (type 'list' to see options). A geodetic datum is a particular spatial reference grid; NAD27 is the older North American Datum, and also common is WGS84 (World Geodetic System of 1984) that covers the globe, which is basically equal to NAD83.  Specify option 1 for datum transformation (contiguous North America).  Our UTM zone is 17 (north).

            Next we specify the UTM boundaries of the location.  You can get this info using the spatial attributes property on an existing map in Arc, or you can examine the header information for elev.txt, which lists the X,Y coordinates of the lower-left (ll) corner and calculate E-W-N-S boundaries by multiplying the number of rows or columns by the resolution (here, 30 m).  Our west edge is 227845, south edge is 3922854, north edge is 3963294 (=3922854+[30*1348]), and east edge is 315235 (=227845+[30*2913]).  We'll work from a spatial resolution of 30 m, which matches our input DEM.  After accepting these parameters, our new location is created.  GRASS then returns to the define Location and Mapset page; select our new location and GRASS starts from the command line.  Note the default GRASS data directory on the define location screen; this is again where maps to be imported should be placed (here, elev.txt).  That's it—the hard part is now over.

 

Importing rasters

            Standard GIS tasks in GRASS are usually very simple.  For example, to import our elevation raster, type:
> r.in.arc input=elev.txt output=elev

which creates the raster file elev and puts it in our raster folder in our PERMANENT Mapset.  Note that the syntax for GRASS commands is based on UNIX: a particular command function (r.in.arc) is followed by required and optional arguments (here, two required arguments are input and output).  If you omit required arguments, GRASS will usually prompt you for more information.  Raster-based commands start with 'r.', vector commands with 'v.', and general commands (like file manipulation, location information, etc.) with 'g.'.  For the latest version (GRASS 6.4), the list of available functions with links for each are here:
http://grass.osgeo.org/grass64/manuals/html64_user/index.html

For map display in both Linux and Cygwin, GRASS uses the standard X windows display drivers, initiated by:

> d.mon start=x1

which opens a monitor window.  You can open more windows (x2, x3, etc) as needed, and close them with replacing start with stop.  To display our imported elevation map:

> d.rast elev

You can add a legend with

> d.legend elev

or try just d.legend without arguments to see the range of formatting options.  A handy tool for summarizing a raster is r.univar, which lists the total cells and simple stats of the distribution of raster values.

 

Topographic calculations

            Aside from interfacing directly with R, a main advantage of GRASS is the built-in functionality for using DEMs to create standard topographic indices that are often used as proxy variables for critical environmental factors like water and radiation.  For example, the r.slope.aspect function uses a DEM to create slope and aspect maps, as well as other DEM derivatives (curvature):

> r.slope.aspect elevation=elev slope=slope aspect=aspect

Generation of new raster maps 'slope' and 'aspect' took a few seconds on my Linux box.

            For hydrologic variables, make sure your region includes all upslope contributing area.  A typical example is the topographic convergence index (TCI) of Beven and Kirkby (1979, A physically-based variable contributing area model of basin hydrology, Hydrol. Sci. Bull. 1: 43–69), which calculates 'potential wetness' as the total upslope contributing area corrected by the steepness of the local slope: TCI = log(upslope area / tan(slope angle)).  This is a one-line command in GRASS, requiring only an input DEM and specification of the output file:
> r.topidx intput=elev output=tci

Despite our DEM containing over 2 million pixels, this took only about two minutes on my Linux box.  You can do the same calculations using GRID in Arc (or Arc Toolbox), but it takes several steps and (in my experience) much longer.  Users with decent rainfall and ET data can create dynamic (spatial water flow) maps using the TOPMODEL hydrologic model with the r.topmodel command.  There are many other raster functions related to hydrologic modeling: r.watershed can delimit separate watersheds from a DEM; r.terraflow creates flow accumulation and direction maps (used to estimate stream locations or find ponding basins), and see the raster command manual above for others. 

            Another very useful function in GRASS is r.sun, a solar irradiance model that uses a DEM along with earth-sun geometry to calculate daily solar radiation for each pixel (direct, diffuse, and reflected).  You specify a day and overall latitude, and the algorithm computes the intensity and daylight length (including a terrain shadowing effect) to derive total daily radiation (in watts per m2 per day).  You can optionally include other variables or maps that allow for ground albedo, atmospheric turbidity, and cloudiness.  Here's an example for calculating direct-beam radiation for each pixel on June 20 (Julian day 171), using shadows but assuming clear-sky conditions:
> r.sun –s eleven-elev aspin=aspect slopein=slope day=171 lat=35.00 beam_rad=rad171

Currently, it is trickier to sum up radiation values for multiple days (e.g., a full month or differences in total annual radiation between pixels).  I paste an addendum shell script below that automates the r.sun routine for any list of days you specify.  You can also do this with the R interface, but for large rasters these calculations are much faster in GRASS than GRASS/R (several minutes in this case).

 

Extracting values from point samples in GRASS

            Outside of the R environment, it is often useful to extract values of certain raster maps for use in subsequent modeling.  In GRASS this is accomplished using the r.what command along with a text file in your working directory that specifies the coordinates of your locations of interest; you put a text file of coordinates in, and get a text file of coordinates plus raster values back.  It is a slightly confusing process but the following steps work for me:

1. In Excel, save your coordinates as 3 columns: X, Y, and (optionally) a site label.  For a 4th column, just paste a comma (",") in a a cell at the end of the line.  Save it as a tab-delimited text file with no header (no column labels).  When you open it in a text editor, it should look like the treedataxy.txt file on the course webpage. For example line 1 is:

275736    3942439   ATBN-01-0403,

2. Put this file in your GRASS working directory.

3. In GRASS:

r.what input=elev,tci null="NA" <treedataxy.txt >output.txt

Which will extract elevation and TCI values according to the coordinates in treedataxy.txt, and place them in a new text file output.txt.  Missing values will get a NA value.

4. The output.txt file now has new rows under each original row for elevation and TCI values.  To format this correctly (one row per site), open the file in a good text editor (my favorite is TextPad), and use the 'find and replace' function to replace the end of the line comma with nothing.  In TextPad: hit F8, make sure 'regular expression' is checked, find ",\n", replace with [blank]. This kills the commas and the line breaks, and it's now easy to examine in Excel (import text with | delimiter) or feed into R.

            Of course you often desire values from many rasters at once, often those whose names are almost identical (but may differ in number, such as days of radiation); for this you can use r.what along with the g.mlist command to specify a set of rasters that share part of their names.  Pay close attention to the syntax here for the input argument:
r.what input="`g.mlist pattern='radiation*' sep=,`" null="NA" <treedataxy.txt >output.txt

This grabs values for all rasters with "radiation" at the beginning of their name.

 

Exporting rasters out of GRASS

            Any new maps created in GRASS but needed elsewhere (like Arc) can be exported with various 'r.out…' commands specific to particular raster formats.  Exporting to Arc via a ASCII GRID file is accomplished via:
> r.out.arc input=tci output=tci.txt

which places the tci.txt file in your working directory.  In Arc Toolbox, use Conversion Tools -> To Raster -> ASCII to Raster to import the raster back into an Arc raster file.

 

An example shell script for automating GRASS functions using r.sun

            Save this as a text file in the grass data directory with a '.sh' suffix, and then execute in GRASS by typing its name at the GRASS prompt.

# Compute a sum of daily global radiation for a given period of the year

echo "Enter elevation map:"

read elev

g.copy rast=$elev,SOLelev

echo "Enter slope map:"

read sl

g.copy rast=$sl,SOLslope

echo "Enter aspect map:"

read asp

g.copy rast=$asp,SOLaspect

echo "Enter Latitude of given region:"

read lat

#loops over Julian days, from 'i' to 'lastday'

i=1

lastday=30

while [ $i -le $lastday ]

do

 # generate convenient map names

 DAY=`echo $i | awk '{printf "%03i", $1}'`

 echo "Computing radiation for day $DAY..."

 r.sun -s elevin=SOLelev aspin=SOLaspect slopein=SOLslope\

       lat="$lat" day="$i"\

       beam_rad=b_rad.$DAY diff_rad=d_rad.$DAY\

                            refl_rad=r_rad.$DAY

 r.mapcalc totalrad.$DAY = "b_rad.$DAY + d_rad.$DAY + r_rad.$DAY"

 r.timestamp totalrad.$DAY date="$i days"

 r.colors totalrad.$DAY col=gyr

 i=`expr $i + 1`

 g.remove rast=d_rad.$DAY,r_rad.$DAY,b_rad.$DAY

done

#sum all days for total radiation, output raster as "globalrad"

echo "Summing all daily rasters..."

r.series input="`g.mlist pattern='totalrad*' sep=,`" output=globalrad method=sum

     #note: if more than 200 or so rasters, have to break this up into pieces

#cleanup:

g.remove rast=SOLelev,SOLaspect,SOLslope

echo "Finished."