Geography 414/514:  Advanced Geographic Data Analysis
Spring 2008

Exercise 1:  Getting and using R
Finish by Tuesday, Apr. 8

1.  Introduction

The object of this exercise is to install and set up R, and to experiment with some basic procedures. R is actually a computer language (that is quite similar to the S language for data analysis and visualization developed at AT&T's Bell Labs), but is best thought of as an "environment" for producing both numerical and graphical analyses of data.  R has several advantages for us here, because

  • it is "open-source" software (which for our purposes means that it can be freely downloaded);
  • it is available for a number of different operating systems, including Windows, Linux, and Macintosh;
  • by itself is fairly powerful and is extensible (meaning that procedures for analyzing data that don't currently exist can be readily developed);
  • it has the capability for mapping data, an asset not generally available in other statistical software; and
  • it has several add-on "packages" specifically designed for the analysis of spatial data.

R has a fairly steep learning curve, which these exercises are designed to diminish.  The home page for the "R project" is at http://www.r-project.org 

Read through the following before beginning...

2. Getting R

R can be downloaded from one of the "CRAN" (Comprehensive R Archive Network) sites.  In the US, the main site is at http://cran.us.r-project.org/  To download R,

  1. go to a CRAN website, and look in the  "Download and Install R" area.  Click on the appropriate link. 
  2. On the "R for Windows" page, for example, click on the "base" link (which should take you to the "R-2.6.2 for Windows" page (http://cran.us.r-project.org/bin/windows/base).
  3. To begin the download, click on "R-2.6.2-win32.exe", and save that file to your hard disk when prompted.  Saving to the desktop is fine.  (The steps for the other operating systems are a little different, but self-explanatory.) 
  4. To begin the installation, double-click on the downloaded file. 
  5. Select the proposed options in each part of the install dialog, except:
  6. When the "Select Components" screen appears, you might want to check the "PDF Reference Manual" box to install the .pdf versions of the R documentation.

There are three "FAQ" pages that contain additional information that may be useful for working out the kinks.  These include

3. Set Up (for Windows)

It will be useful while running R on Windows to create one (or more) "working folders" that R can use to store its internal workspace (which will appear in that folder as a file named .Rdata), and into which you can download or create data sets (e.g. in Excel or ArcGIS), or files containing R "source code" (e.g. using a text editor like TextPad).  Once that folder is created, then a shortcut (icon) on your desktop can be created that points to that working folder while starting up R.

To create a working folder,

  1. start Windows Explorer (right-click on the Start button, and click on "Explore")
  2. browse to or create a new folder that will contain the R data and files (e.g. create a new folder called "geog414" or something).
  3. open that folder by clicking on it, and
  4. create a new folder in the geog414 folder called, for example, "working" (File > New > Folder etc.),
  5. open that folder by clicking on it, and
  6. create another new folder in the \working folder called, for example, "class1".

To create the desktop shortcut,

  1. find the "R 2.6.2" shortcut (icon) in the Start Menu (Start > Programs > R)
  2. right-click on the icon, and copy the shortcut
  3. past the shortcut onto the desktop
  4. right-click on the new shortcut, and click on "Properties"
  5. on the "Shortcut" tab, in the field called "Start in:' enter the full path to the folder you just created (e.g. \geog414\working\class1--you can get this by right-clicking on the working-folder icon, clicking on "Properties" and copying it from the "Target" field) and
  6. on the "General" tab, change the name of the shortcut to the working folder name (e.g. "class1").

If the shortcut has been properly created, you can use it to start R, and it will automatically assume its working folder is the one you created.  Other shortcuts and working folders can be created.

4. Starting R

To start the R "gui" (graphical user interface), just click on the shortcut you just created.  After a brief pause, you shoud see the message:  R version 2.6.2 (2008-02-08) Copyright (C) 2008 The R Foundation for Statistical Computing  ...[Previously saved workspace restored] appear in the "RConsole" window.  You can verify that R is looking at the correct folder by clicking on File > Change dir... on the RGui menu.  If you're in the folder you just created, fine, otherwise you could browse to it here.

See pages 3-4 in Maindonald, Using R for Data Analysis... for a description of what the various menus and windows in the R GUI do.

The command window (or RConsole) is where you type commands and view text (as opposed to graphics) results.  The prompt is the character ">" (in red usually) at the bottom of the text in the R Console window.  If you've scrolled away from the prompt, typing anything in the window will bounce you back.

Most of the time when using R, you'll also want to use a text editor (e.g. TextPad, but Notepad or even Word will work), so you may want to start that first.  RGui has a built-in script editor too, which can also be used to edit files.

5.  Installing Packages

R comes with a number of add-on packages that are installed when you install R.  Future exercises will use a number of "R packages" or libraries of functions, data sets, etc. that must be downloaded and installed from "CRAN" (you will need to be connected to the Internet to do this), and it would be handy to install them now. 

In the Windows R Gui, there is a menu choice "Packages" that assists in downloading and installing packages, (see Packages > Install package(s) from CRAN), and there is a similar feature on the Mac.

You will get the following message:  --- Please select a CRAN mirror for use in this session --- and a scrolling list box should open.  It turns out that the closest repository to us is in Seattle and is the last one in the list, so scroll down and select it, and then click on "ok".  You can also use the Packages menu to chooses the closest mirror.

When the scrolling list box appears with package names in it,

For the next few exercises, you'll need to install the following packages:  sp, maptools, rgdal, maps, mapproj, mapdata, classInt, scatterplot3d, and RColorBrewer.

You can check to see if a package has been successfully downloaded and installed by attempting to load the package with the library() function, e.g.

library(maptools)

If an error message is produced e.g. Error in library(maptools) : There is no package called 'maptools') then the download and installation has failed.  If that's the case, packages may also be downloaded and installed using the command line in the R Gui, as follows:

options(CRAN = "http://cran.us.r-project.org/") # tell R where to look for packages
install.packages("maptools") # download and install the maps package

On a Mac, the documentation suggests that this is done a little differently:

options(CRAN = "http://cran.us.r-project.org/") # tell R where to look for packages
install.binaries("maptool") # download and install the maps package

(You don't need to use the command line approach if you use the menu--just download the packages once.)

Occasionally, it's a good idea to check if packages have been updated; this can be done by typing.

update.packages()

or using the menu, Packages > Update packages from CRAN.

6.  Quitting R

There are several ways to quit R -- clicking on the "close window" button, typing File > Exit from the RGui menu, or typing quit() at the command prompt (or more simply q()).  (Note that you must type the parentheses.)  R will ask if you want to save the current workspace image.  In general, you'll want to do that, but there are cases when you might not want to (e.g. you've accidentally deleted some intermediate results).

7. Getting Help

The first thing to do in learning new software is figure out how to get help.  R has several approaches:

  • a quick way to get help on a particular function or command, for example, the quit function described above, is to type a question mark plus the name of the function at the command line, e.g. "?quit", you can also type help(quit). (Note that typing "?quit" will be one of the few times in which a function ("quit()") is typed without the parentheses.
  • you can also get to a web page-based help system by typing help.start() at the command line or using the Help > Html help menu from the RGui.  They key links on the help page are:
    1. "An Introduction to R" (the built-in main manual)
    2. "Package" which lists the contents of the basic and added packages that R knows about.
    3. "Search Engine and Keywords" which allows you to search for function names and the keywords associated with each function, and for information on built-in data sets.

8.  A Data Set

The Summit Cr. geomorphic data consists of 88 observations of 11 variables along an 0.8-km stretch of Summit Cr. in eastern Oregon. This data set was collected by Pat McDowell, Frank Magilligan and their students as part of their study of the effects of cattle "exclosures" on the morphology of stream channels. They divided this stretch of Summit Cr. into individual "hydrologic units" (HU's) that were either pools, shallow "riffles," or straight "glides." The overall study area is divided into three sections: an upstream reach (reach A) in which cattle are permitted to graze, a middle reach (reach B) from which cattle have been excluded, and a downstream reach (reach C), in which cattle were again permitted to graze.

The dataset contains the following information:

Column name measurement scale/
R data class
Definition
1 Location alphanumeric/
character
ID for a particular cross section
2 Reach nominal/
factor
Reach (A=upstream reach (grazed); B=exclosure reach (no cattle); C=downstream reach C (grazed)).
3 HU nominal/
factor
hydrologic unit type (P=pool; R=riffle; G="glide", or straightwater stretch
4 CumLen ratio/
numeric
cumulative distance downstream from the upstream end of reach A (meters)
5 Length ratio/
numeric
length of a hydrologic unit (meters)
6 DepthWS ratio/
numeric
depth of the channel from the water surface to the bottom
7 WidthWS ratio/
numeric
width of the channel at the water surface (meters)
8 WidthBF ratio/
numeric
width of the channel at the bankfull stage (meters)
9 HUAreaWS ratio/
numeric
area covered by the hydrologic unit at the water surface (sq. meters)
10 HUAreaBF ratio/
numeric
area covered by the hydrologic unit at the bankfull stage (sq. meters)
11 wsgrad ratio/
numeric
water-surface gradient (meters/meters, i.e. dimensionless)

The above table is sometimes referred to as a "codebook" that provides an expanded definition for each variable.  (There is a tradeoff between shortish variable names, which are efficient to type, and longish variable names that are more self-explanatory.)

9. Importing a Data Set

Reading data

R can read data from a number of different sources, including text (ascii) data and the .csv (comma separated values) format of Excel spreadsheets, as well as from an internal format, which is text-based, but not easily readable by humans.  R stores the data, names of variables, etc. in an efficient form in its workspace (.Rdata) that can be saved and reloaded.

At the time of this writing, the most efficient way to open and import a new data set is in .csv format, which can be download from a web page, either the "data sets" page on the course web page, or from a link on one of the exercise pages like this one.

Importing a data set or shape file into R is a two-step procedure:  1) getting or downloading the data set from a server onto the computer you're using, and 2) reading into R.

To download  the Summit Cr. data set, (Step 1)

  1. right-click on a link to a data set on a web page, like this one:  [sumcr.csv]
  2. then save the file (using Internet Explorer, click on "Save target as..." or for Netscape or Firefox, click on "Save link as...",
  3. then browse to the working folder created above, and 
  4. save the file.

To read the Summit Cr. data set into R (Step 2), type the following:

sumcr <- read.csv("sumcr.csv")

NOTE:  Punctuation, spelling and case are important.  R is case sensitive; in other words, Sumcr is not the same thing as sumcr, and Read.csv is not the same as read.csv.

The read.csv() creates a data frame "object" called "sumcr" that contains the data from the .csv file.  Note that the data frame object doesn't need to have the same name as the file, but by convention it usually does.  The "<-" arrow is called the "assignment operator", which, as it sounds, assigns whatever object is to its right to whatever object is to its left, sometimes creating a new object in the process.  In reading a line of text, the operator is usually spoken as "gets" as in "the dataframe sumcr gets the contents of the sumcr.csv file."  In newer versions of R, the equals (=) sign can be used, but in most existing texts and .pdf files, the <- version is used.

The advantage of this approach is that you have an Excel-editable copy of the data set in your working folder.

An alternative approach is to use the file.choose() function to browse to a particular file:

sumcr <- read.csv(file.choose())

This will open an "Select file..." dialog box.

Looking at the data

The first thing to do is to check to see that R indeed has the Summit Cr. data frame in its workspace.  This can be done by typing ls() (the list function) at the command line, or clicking on Misc > List objects on the RGui menu. 

The data frame can be examined a couple of different ways:

  • by simply typing the name of the data frame at the command line (e.g. sumcr), or
  • by editing the data set using the built in editor.  The editor is started up by typing fix(sumcr) at the command line, or by using Edit > Data editor from the Rgui menu, and then typing in the name of the data frame in the "Question" dialog box. 

Use the close button on the editor window, or the File > Close menu to close the editor down and return to the RConsole window.

The names() function can be used to get a list of the variables in a data frame, e.g.:  names(sumcr)

The individual variables are referred to by a "compound"  name consisting of the data frame name and the variable name, joined by a dollar sign ($), e.g. sumcr$WidthWS  Note that variable names are case-sensitive too (e.g. the name sumcr$WidthWS is not the same as sumcr$widthws.)  This manner of referring to variables can be made less cumbersome by using the attach() function.  For example, try typing the following (don't type the material in parentheses, or the comments within a line, just the text in the Courier type face:

sumcr$WidthWS   (works ok)
WidthWS
   produces the error message Object "WidthWS" not found)
attach(sumcr)
, press Enter, followed by WidthWS  on the next line (works ok).

10.  What to hand in.

Use the summary() function to produce a quick summarization of the data set:

summary(sumcr)

To print the summary out, select the text, and click on the "print" icon, or use File > Print.

 

[Geog. 414/514] [syllabus] [lectures & exercises] | [GeogR] [topics] [data sets] [documentation]