April 12: Presence-only modeling with Maxent


An important class of niche modeling techniques involves species distribution information based only on known presences (e.g., recorded occurrences, like herbarium or museum data).  This is a common predicament, especially across broad spatial scales, and several modeling methods have been developed for this purpose (reviewed in Chapter 8 of Mapping Species Distributions by Janet Franklin, 2009).  One method called Maximum Entropy or Maxent is becoming increasingly popular because it performs extremely well in predicting occurrences in relation to other common approaches (particularly compared to GARP, a genetic algorithm approach; see the Elith et al. 2006 Ecography paper), and it is also designed to integrate with GIS software such as Arc products, thus making data input and predicted (mapped) output easier to handle.  Maxent works by finding the largest spread (maximum entropy) in a geographic dataset of species presences in relation to a set of 'background' environmental variables; this is essentially the same as maximizing the log likelihood of the data associated with the presence sites minus a penalty term, conceptually similar to information criteria like AIC.  The original formulation for species distribution modeling is described in Phillips et al. (2006, Ecological Modelling 190: 231).  As is clear from the Phillips and Dudik (2008) paper, the penalty term is where the action is: each environmental variable (called a 'feature') gets weighted according to how much it adds complexity to the model; the sum of these weightings (including a regularization parameter, which is determined empirically) determines how much the likelihood should be penalized to prevent overfitting.  Maxent allows users to vary these regularization parameter settings, but this is not recommended unless you have something additional to contribute to Phillips and Dudik (2008).  The current Maxent regularization defaults are from the results of that study.

            Maxent is a very slick program, and it gives immediate gratification (esp. with GIS data) in that it can turn a relatively small geographic sample of occurrences into a highly parameterized landscape model of predicted occurrences.  It is similar to regression tree approaches (e.g., Random Forests) in that it specializes in predictive accuracy at the cost of biological interpretation, but the maximum entropy approach has been shown to work better than other similar approaches in many circumstances, and especially when the number of samples is small (this is partly because it is not a regression, but also because its optimization routine is guaranteed to the converge on the maximum entropy solution, at least according to Phillips et al. 2006).  It is thus a growing approach of choice for many end-users who have limited data and few resources to do a detailed biological study of any particular organism: they want a predictive approach that best exploits the data at hand, and with the fewest steps between a dataset and a mapped product.  Thus, in some ways an approach like Maxent represents the other extreme from the hierarchical Bayesian approach: Maxent prioritizes predictive ability and convenience at the expense of inference on particular parameter values, while HB delivers strong inference on particular parameter values via the posterior but downplays the importance of any particular outcome.


Installing Maxent

            Maxent runs in Java. If you don’t have it, a current version of the Java runtime environment (JRE) is here:  http://java.sun.com/javase/downloads/index.jsp.  If you don’t know if you have it, run ‘java –version’ from the command line (Windows: Run: cmd). If you get an error, you don’t have it. Once you have Java download the latest version of Maxent from here (after registering): http://www.cs.princeton.edu/~schapire/maxent/.  In addition to installation instructions in the readme.txt file, you get the Java executable (maxent.jar) and, for Windows, a batch file to trigger the executable (maxent.bat).  These are small files and can be copied and pasted to various directories as needed.  To start Maxent double-click maxtent.bat.  The help button in the lower right-hand corner opens a brief tutorial that explains everything that I cover here and more.


Data formatting

            Maxent is designed to integrate smoothly with GIS programs like Arc or GRASS GIS by using ESRI ASCII grid format files (text files representing raster data), but it will also take as input simple csv files that include species occurrences and environmental data.  You can upload an entire multi-species dataset (like our treedata.csv file) and then choose which species to model in Maxent; choosing multiple species means Maxent will create separate output for each unique species identifier in the “samples” (species occurrence) dataset.  Maxent insists on species names in the first column, followed by geocoordinates X and Y, and environmental variables, which I’ve reformatted as treedata2.csv (on webpage).  This will be uploaded into the ‘samples’ path and becomes the training dataset for Maxent.  I've also changed disturbance class to a set of numeric values ('distclass') as required by Maxent.    The ‘background’ dataset (describing pseudo-absences, that is, a random sample of non-occurrences from the region of interest) that encompasses the prediction locations can also be uploaded as a csv file to the ‘environmental layers’ path.  Maxent prefers 10,000 random locations for this (the default setting when using the raster input format), but we can illustrate how Maxent works with the same dataset above using all observations regardless of species.  Maxent won't allow you to use exactly the same file both training and background datasets, so I've simply made a copy of treedata2 called treedata3.csv. 


An initial run

            Enter the paths to the treedata2.csv and treedata3.csv files in the samples and environmental layers directories, resp. In the left window, you should get a checked list of species; hit ‘deselect all’ and then select Magnolia fraseri. In the right window you get a list of environmental variables to include in the occurrence model.  One of the advantages of Maxent is that the regularization protocol tends to protect against overfitting, so you can be fairly liberal with predictor variables: select beers, distclass, elev, streamdist, and tci, and be sure to change the class of distclass to ‘categorical’. You need to create a folder for the output and browse to the appropriate path in 'output directory'.  Click 'create response curves' and 'do jackknife…', and keep the default logistic output (where each predicted observation is a probability of occurrence for that location).  The output file type is for output GIS grids, and the bottom file directory is for an alternate set of environmental predictors beyond the study region that can be used for (optional) species projections.

            The final step is to choose how you'd like Maxent to treat each predictor variable, or feature.  Linear features use simple linear coefficients for each predictor; quadratic features use squared predictor values; product features describe pairwise interactions between predictors; threshold features involve simple step functions; and hinge features combine linear and step functions.  The 'auto features' option automates the task of choosing feature types using an empirical algorithm based on sample size.  For our initial run select 'threshold features' only and click Run.


What is Maxent doing?

            Maxent builds models of species occurrence starting with a uniform distribution of probability values over the entire grid (or here, all the observations in the background and samples files), and then conducts an optimization routine that iteratively improves model fit, measured as the gain.  Gain is basically a likelihood (deviance) statistic that maximizes the probability of the presences in relation to the background data, corrected for the case where the probabilities of all pixels are equal (uniform).  Gain increases to an asymptote, and the final probability distribution becomes the basis for fitted predictor variable coefficients.  Taking the exponent of the final gain gives the mean probability of the presence samples compared to random background pixels.


Model output

            When the run is finished, Maxent generates a slick html file that summarizes model performance, the importance and of each predictor variable and the shape of its influence, documentation of the options chosen, and links to raw data files of predictions and model coefficients that can be analyzed elsewhere (like R).

            Analysis of omission/commission shows two graphs and a table that evaluate model performance/bias as a function of predicted occurrence values, which can be helpful in deciding on a occurrence probability threshold to model ‘habitat’ vs. ‘non-habitat’.  The first graph flags potential sample bias: it shows the relationship between predicted values of occurrence probability (either from the training or test sets) and the proportion of occurrences selected.  Unless the predictions are biased, this should be a 1:1 relationship (setting a 5% cutoff should remove 5% of the occurrences).  The second graph shows how well the model performs in predicting occurrences compared to a random selection of points.  A perfect model will need only X points to capture all X occurrences, and thus would appear as a right angle; a random model will only capture occurrences proportionally to all points samples and appear as a 1:1 line.  Good models will thus increase more steeply than a straight line, and the area under the curve (AUC) will be high.  Here, our AUC value of 0.652 is fairly low.  Finally, the table lists different criteria for choosing logistic cutoff values to represent ‘habitat’ vs. ‘non-habitat’, along with tests of whether such cutoffs would be superior to a random selection of points.  For example, the ‘minimum training presence’ criterion is the logistic threshold that results in inclusion of all training presences (here, 0.122).

Response curves show the probability of species occurrence, given x values of each predictor variable and either 1) average values of all other variables (top set) or 2) all other variables are excluded from the model (bottom set).  These can be very different if variables are correlated.  Here we allowed Maxent to use only step functions (breakpoints) in each feature's model.  You can get a sense of the relationship of these figures with fitted model coefficients by opening the "Magnolia_fraseri.lambdas' file in the output folder.  Click on the upper graph of the tci response (within the full model) to maximize the graph, and examine the breakpoints in relationship to the text file lines that include tci as the coefficient: there should be 3 values corresponding to zero (line 7), the breakpoint at tci=13.2 (line 16), and tci=3 (line 25).  The second column of values are the Maxent coefficient values; translating these into coefficients for, say, a logistic regression is not straightforward (see 'format of the lambda file' in the Help file).  The final two values are the minimum and maximum values of the predictor.  Finally, note that values outside the range of the training values in the figures are 'clamped', that is, fixed at response values for the min and max predictor values.  This is to avoid extrapolation problems associated with the model being exponential, and thus unbounded for values outside the range of background variables.

            Analysis of variable contributions includes two methods for ranking the importance of predictor variables.  In its optimization routine, Maxent keeps track of how much the overall model gain is improved when small changes are made to each coefficient value associated with a particular feature.  At the end of the run, all these small gains associated with each feature are summed and taken as a proportion of all contributions.  Although this is helpful, note that if variables are highly correlated the percent contribution of those variables used later in the optimization will have underestimated importance.  More robust is the second plot (or set of plots) relating to 'jackknife' contributions, which accounts for dependencies between predictor variables by building two sorts of models: one involving a given feature by itself, and the other involving all features EXCEPT for the given feature.  The x-axis is a measure of model predictive ability, using either 1) training gain; 2) test gain; or 3) AUC on test data.  Dark blue bars indicate how well a model performs using only that feature compared to the maximal model (red bar), and light blue bars indicate how well as model performs excluding that feature.  Thus, important variables can either have 1) large dark blue bars, indicating strong (but perhaps non-unique) contribution to presences; 2) short light blue bars, indicating no other variable contains equivalent information; or 3) both, indicating the variable is independently predictive.  Comparing the three types of model performance can also be helpful, in that some variables that show up as important in the training runs may be causing overfitting or lack of generality in the test runs and should be omitted.  In fact, sometimes excluding a variable can yield an even better goodness of fit than a model containing all variables (light blue line beyond the red line).

            For our initial run of stepwise models of our 5 major predictor variables, we see that elevation is clearly the most important predictor, accounting for a substantial part of the gain by itself (dark blue bar), although not all of this gain can be attributed solely to elevation, as indicated by its light blue bar making up almost half of the total model gain.  Stream distance is of secondary importance.  The response curves show that magnolia abundance peaks at mid elevation and is strongly associated with streams. 

            Finally, the output data used to create the above summaries is available in the output folder and is linked to the html summary file, including a full set of predicted occurrence probabilities, Maxent model coefficients, omission data, predictions for the training samples, and a one-row csv file of the major results for further use (say, when comparing results of different Maxent models in R).  Predictive accuracy is listed in terms of AUC (area under the curve, from the second graph in the html file), which measures the probability that random presence site is ranked above a random absence site (here, AUC=0.652).  Values for model settings are also listed, along with nice add-on of how to execute the same model fit using the command line. 


Exploiting full Maxent potential with GIS data

            If the environmental layers data are given as raster data (ESRI ASCII grid format), then Maxent is able to produce predictive maps of species occurrences, both as image files and as raster output that can be further manipulated in a GIS.  In this case a grid of each predictor variable is put in one folder (e.g., workspace\ascii), with an '.asc' suffix.  ASCII grid files are text files of values of each grid cell, with -9999 coded for NODATA cells and 6 lines of header info (see the Help file for more information).  If you're generating these in Arc, frustrations can abound with environmental layers that have unmatching projections (incl. datum), spatial extent and gain, and mask (NODATA) cells.  In GRASS GIS, if these layers are already part of a single LOCATION, you only need to export the rasters as ASCII grids ('r.out.arc' command) and copy them to your Maxent workspace\ascii directory.  I’ve put grid files of four predictor variables (elevation, tci, log(stream distance), and total annual radiation) on the course webpage, which we’ll use here from a single directory (e.g., maxent/ascii).

            Change the ‘environmental layers’ path to the maxent/ascii directory (or whichever folder contains the above .asc files; they must have the .asc suffix).  Each grid layer pops up in the right-hand box; keep these selections (all continuous).  Check all the feature options (linear to hinge), and then check ‘auto features’.  Create a new output folder and insert this path in the output directory box, and click all 3 options on the right.  Then Run.  You’ll get one or more warnings about the samples dataset, which you can ignore.  The run will take several minutes as Maxent inputs the grids and fits a full model and separate models for each predictor.

            Examination of the output html file shows a slightly better model than before (AUC=0.714) but with different variable importance: moving from the coarse ‘beers’ descriptor of radiation to an actual calculation of annual direct radiation received above the canopy (totrad variable, calculated in GRASS) strongly increased radiation importance, and reduced the importance of stream proximity.  Using GIS-based input also allows for GIS output, where we now have a predicted occurrence raster for the park (‘Pictures of the model’).  The output raster is Magnolia_fraseri.asc, which can be easily imported into Arc or GRASS for further manipulation. 



Compare the predictive ability of Maxent and GAMs for Magnolia fraseri occurrence (logistic) probability, using maximal models that include elevation, tci, stream distance, beers, and disturbance class (i.e., using the treedata2 and 3 csv files for Maxent, and treedata.csv for mgcv in R).  Use the Pearson correlation coefficient or AUC to compare the observed and predicted values.