Geography 4/517:  Geographic Data Analysis
Spring 2011

Exercise 1:  Getting and using R
Finish by Monday, April 4

1.  Introduction

The object of this exercise is to install and set up R, and to experiment with some basic procedures. R is actually a computer language (that is quite similar to the S language for data analysis and visualization developed at AT&T's Bell Labs), but is best thought of as an "environment" for producing both numerical and graphical analyses of data.  R has several advantages for us here, because

  • it is "open-source" software (which for our purposes means that it can be freely downloaded);
  • it is available for a number of different operating systems, including Windows, Linux, and Macintosh;
  • by itself is fairly powerful and is extensible (meaning that procedures for analyzing data that don't currently exist can be readily developed);
  • it has the capability for mapping data, an asset not generally available in other statistical software; and
  • it has several add-on "packages" specifically designed for the analysis of spatial data.

R has a fairly steep learning curve, which these exercises are designed to diminish.  The home page for the "R project" is at http://www.r-project.org 

Read through the following before beginning...

2. Getting R

R can be downloaded from one of the "CRAN" (Comprehensive R Archive Network) sites.  In the US, the main site is at http://cran.us.r-project.org/  To download R,

  • go to a CRAN website, and look in the  "Download and Install R" area.  Click on the appropriate link. 

Windows (7 and Vista, XP is basically similar)

  1. On the "R for Windows" page, for example, click on the "base" link (which should take you to the "R-2.12.2 for Windows" page (http://cran.us.r-project.org/bin/windows/base).
  2. To begin the download, click on "Download R-2.12.2 for Windows", and save that file to your hard disk when prompted.  Saving to the desktop is fine.  Here's an image of what the CRAN page should look like:  Win download
  3. To begin the installation, double-click on the downloaded file, or open it from a downloads window. Don't be alarmed unknown publisher type warnings.  Window's  UAC (User Account Control) will also worry about an unidentified program wanting access to your computer.  Click on "Allow".  Here's what the warning looks like:  warning
  4. Select the proposed options in each part of the install dialog, except:
  5. When the "Select Components" screen appears, you might want to check all of the Online PDF Manuals check boxes to install the .pdf versions of the R documentation.  Here's what it should look like: components.

Mac OS X

  1. On the "R for Mac OS X" page (http://cran.fhcrc.org/bin/macosx/)  there are multiple packages that could be downloaded.  The one you probably want is the topmost one.  Here's what the CRAN page should look like:  Mac download
  2. To download the package click on the link "R-2.12.2.pkg (latest version)" link.
  3. After the package finishes downloading, right-click in the Downloads window, and click on "Show in Finder" (or just look in the Downloads folder).  This will open a new Finder window with the installer package, which will look something like this:  Mac package
  4. Then double-click on the installer package, and after a few screens, select a destination for the installation of the R framework (the program) and the R.app GUI.  That dialog will look something like this:  Mac install  Note that you will have supply the Administator's password.
  5. Two applications will appear in the Appliations folder:  R.app (standard) and R64.app (which exploits the 64-bit operating system.  For most of the things we'll do in class, the standard R.app application will be fine.

There are three "FAQ" pages that contain additional information that may be useful for working out the kinks.  These include

3. Set Up

Windows (7 and Vista)

It will be useful while running R on Windows to create one (or more) "working folders" that R can use to store its internal workspace (which will appear in that folder as a file named .Rdata), and into which you can download or create data sets (e.g. in Excel or ArcGIS), or files containing R "source code" or scripts (e.g. using a text editor like the built-in script editor in R).  Once that folder is created, then a shortcut (icon) on your desktop can be created that points to that working folder while starting up R.

On Windows, the possibility exists to create a number of working folders, wherein the data for specific projects can be conveniently stored.  For this class, one folder will probably do the job.

To create a working folder,

  1. start Windows Explorer (right-click on the Start button, and click on "Explore")
  2. browse to or create a new folder that will contain the R data and files (e.g. create a new folder called "geog417" or something).  Pick a sensible location for this folder; on Windows 7 (or Vista), probably in the c:\Users\xxxx\Documents\ folder (e.g. c:\Users\bartlein\Documents\)
  3. open that folder by clicking on it, and
  4. create a second new folder in the geog417 folder you just created called, for example, "work01" (File > New > Folder etc.).

The new folder will be empty, but should look something like this:  working-folder

To create the desktop shortcut,

  1. find the "R 2.12.2" shortcut (icon) in the Start Menu (Start > Programs > R) or on the desktop.
  2. right-click on the icon, and click on "Create Shortcut"
  3. paste the shortcut back onto the desktop
  4. right-click on the new shortcut, and click on "Properties"
  5. on the "Shortcut" tab, in the field called "Start in:' enter the full path to the folder you just created (e.g. c:\Users\bartlein\Documents\geog417\work01 (Windows will help fill this in) and
  6. on the "General" tab, change the name of the shortcut to the working folder name (e.g. "class1").

The Shortcut tab should look something like this:  shortcut-tab.

If the shortcut has been properly created, you can click on  it to start R, and it will automatically assume that its working folder is the one you created.  Other shortcuts and working folders can be created.

Mac OS X

The Mac version of R has a built-in Workspace browser, which makes the maintenance of separate workspaces straightforward, and so it is unnecessary to creat a desktop shortcut.  To create a working folder, the procedure is similar to that on Windows

1.      start Finder

2.      click on File > New Folder, and create a new folder in your User/Documents folder and name this geog417

3.      create another new folder within that folder named work01.

The last folder created should be named, for example, /User/Documents/geog417/work01, where "User" is your user name.

The R.app GUI can be added to the Dock for convenience in starting up R.

4. Starting R

To start the R "gui" (graphical user interface), just click on the shortcut you just created (in Windows)  or on the R.app GUI (Mac) in the Applications folder. 

After a brief pause, you should see the message: 

R version 2.12.2 (2011-02-25)

Copyright (C) 2011 The R Foundation for Statistical Computing

[Previously saved workspace restored]

appear in the "RConsole" window.  In Windows, you can verify that R is looking at the correct folder by clicking on File > Change dir... on the RGui menu.  If you're in the folder you just created, fine, otherwise you could browse to it here.

See pages 3-4 in Maindonald, Using R for Data Analysis... for a description of what the various menus and windows in the Windows version of the R GUI do.

On the Mac, you will probably be in a default folder.  You can use the Misc > Change Working Directory… menu command to browse to the folder created above.

The command window (or RConsole) is where you type commands and view text (as opposed to graphics) results.  The prompt is the character ">" (in red, usually) at the bottom of the text in the R Console window.  If you've scrolled away from the prompt, typing anything in the window will bounce you back.

Most of the time when using R, you'll also want to use a word processor (e.g. Word), and a text editor like Notepad (Windows) or TextEdit (Mac), so you may want to start them too.  RGui on Windows has a built-in script editor too, which can be used to edit files.

5.  Installing Packages

R comes with a number of add-on packages that are installed when you install R.  Future exercises will use a number of "R packages" or libraries of functions, data sets, etc. that must be downloaded and installed from "CRAN" (you will need to be connected to the Internet to do this), and it would be handy to install them now. 

Windows 7 and Vista -- Important!

Because Windows 7 and Vista object to the idea of programs installing files into the C:\Program Files folder, R will run into trouble when it attempts to install add-in packages there.  The most reliable work-around seems to be to create what's known as a personal library where the packages are stored.  R will gernally offer to create one the first time you download a package, but Windows sometimes does not get the permissions entirely correct.  The best thing to do is to create the folder yourself before downloading a package the first time. e.g., create the folder

C:\Users\bartlein\Documents\R\win-library\2.12\

(Later, when you update R to a new version (e.g. R 2.13.0) you can create another new personal library, move the old packages there, and use R to update them.)

Windows

In the Windows R Gui, there is a menu choice "Packages" that assists in downloading and installing packages, (see Packages > Install package(s) from CRAN), and there is a similar feature on the Mac.

You will get the following message:  --- Please select a CRAN mirror for use in this session --- and a scrolling list box should open.  It turns out that the closest repository to us is in Seattle and is the last one in the list, so scroll down and select it, and then click on "ok".  You can also use the Packages menu to chooses the closest mirror.

When the scrolling list box appears with package names in it, click on the package(s) you want to install

Mac OS X

On the R.app GUI there is a menu choice called "Packages & Data"  The R Package Installer menu choice brings up a dialog that can be populated with packages by clicking "Get List".  Select the name of the package, and click "Install Selected".  When the package is installed, its version number will appear in the listing.

Packages used in the course

For the next few exercises, you'll need to install the following packages: 

sp, maptools, rgdal, maps, mapproj, mapdata, classInt, scatterplot3d, and RColorBrewer.

Note:  At the moment, the rgdal package for the Mac has to be compiled from source code (meaning that it’s not an automatically working download), so if you’re a Mac user, skip it downloading it for now.

You can check to see if a package has been successfully downloaded and installed by attempting to load the package with the library() function, e.g.

library(maptools)

If an error message is produced e.g. Error in library(maptools) : There is no package called 'maptools') then the download and installation has failed.  If that's the case, packages may also be downloaded and installed using the command line in the R Gui, as follows:

options(CRAN = "http://cran.us.r-project.org/") # tell R where to look for packages
install.packages("maptools") # download and install the maps package

On a Mac, the documentation suggests that this is done a little differently:

options(CRAN = "http://cran.us.r-project.org/") # tell R where to look for packages
install.binaries("maptool") # download and install the maps package

(You don't need to use the command line approach if you use the menu--just download the packages once.)

Because there are over 2000 packages at CRAN now, scrolling through the list can get tedious.  You can also install packages within R using the command line:

install.packages("sp")

If you haven’t selected a CRAN mirror in the current session, R will prompt you for one.

Occasionally, it's a good idea to check if packages have been updated; this can be done by typing.

update.packages()

or, on Windows, using the menu, Packages > Update packages from CRAN, or on the Mac clicking "Update All" on the R Package Installer.

Windows 7 and Vista may complain (within R) when you try to update packages.  That's what is going on if you see the following error message after typing update.packages(), which indicates that the packages were not successfully updated.

C:\Users\bartlein\AppData\Local\Temp\RtmpBsLdAb\downloaded_packages
Warning in install.packages(update[instlib == l, "Package"], l, contriburl = contriburl, : 'lib = "C:/PROGRA~1/R/R-212~1.2/library"' is not writable
Error in install.packages(update[instlib == l, "Package"], l, contriburl = contriburl, : unable to install packages

The work-around is to start R as an “Administrator” (right-click on the R icon, and click on “Run as Adminstrator” before updating packages.  Here's what the dialog should look like:  run-as-admin

6.  Quitting R

There are several ways to quit R -- clicking on the "close window" button, typing File > Exit from the RGui menu on Windows, typing R > Quit R on the RGui menu on the Mac (or clicking on the power switch, or typing quit() at the command prompt (or more simply q()).  (Note that you must type the parentheses.)  R will ask if you want to save the current workspace image.  In general, you'll want to do that, but there are cases when you might not want to (e.g. you've accidentally deleted some intermediate results).

7. Getting Help

The first thing to do in learning new software is figure out how to get help.  R has several approaches:

  • a quick way to get help on a particular function or command, for example, the quit function described above, is to type a question mark plus the name of the function at the command line, e.g. "?quit", you can also type help(quit). (Note that typing "?quit" will be one of the few times in which a function ("quit()") is typed without the parentheses.
  • you can also get to a web page-based help system by typing help.start() at the command line or using the Help > Html help menu from the RGui.  They key links on the help page are:
    1. "An Introduction to R" (the built-in main manual)
    2. "Package" which lists the contents of the basic and added packages that R knows about.
    3. "Search Engine and Keywords" which allows you to search for function names and the keywords associated with each function, and for information on built-in data sets.

8.  A Data Set

The Summit Cr. geomorphic data consists of 88 observations of 11 variables along an 0.8-km stretch of Summit Cr. in eastern Oregon. This data set was collected by Pat McDowell, Frank Magilligan and their students as part of their study of the effects of cattle "exclosures" on the morphology of stream channels. They divided this stretch of Summit Cr. into individual "hydrologic units" (HU's) that were either pools, shallow "riffles," or straight "glides." The overall study area is divided into three sections: an upstream reach (reach A) in which cattle are permitted to graze, a middle reach (reach B) from which cattle have been excluded, and a downstream reach (reach C), in which cattle were again permitted to graze.

The dataset contains the following information:

Column

name

measurement scale/
R data class

Definition

1

Location

alphanumeric/
character

ID for a particular cross section

2

Reach

nominal/
factor

Reach (A=upstream reach (grazed); B=exclosure reach (no cattle); C=downstream reach C (grazed)).

3

HU

nominal/
factor

hydrologic unit type (P=pool; R=riffle; G="glide", or straightwater stretch

4

CumLen

ratio/
numeric

cumulative distance downstream from the upstream end of reach A (meters)

5

Length

ratio/
numeric

length of a hydrologic unit (meters)

6

DepthWS

ratio/
numeric

depth of the channel from the water surface to the bottom

7

WidthWS

ratio/
numeric

width of the channel at the water surface (meters)

8

WidthBF

ratio/
numeric

width of the channel at the bankfull stage (meters)

9

HUAreaWS

ratio/
numeric

area covered by the hydrologic unit at the water surface (sq. meters)

10

HUAreaBF

ratio/
numeric

area covered by the hydrologic unit at the bankfull stage (sq. meters)

11

wsgrad

ratio/
numeric

water-surface gradient (meters/meters, i.e. dimensionless)

The above table is sometimes referred to as a "codebook" that provides an expanded definition for each variable.  (There is a tradeoff between shortish variable names, which are efficient to type, and longish variable names that are more self-explanatory.)

9. Importing a Data Set

Reading data

R can read data from a number of different sources, including text (ascii) data and the .csv (comma separated values) format of Excel spreadsheets, as well as from an internal format, which is text-based, but not easily readable by humans.  R stores the data, names of variables, etc. in an efficient form in its workspace (.Rdata) that can be saved and reloaded.

At the time of this writing, the most efficient way to open and import a new data set is in .csv format, which can be download from a web page, either the "data sets" page on the course web page, or from a link on one of the exercise pages like this one.

Importing a data set or shape file into R is a two-step procedure:  1) getting or downloading the data set from a server onto the computer you're using, and 2) reading into R.

To download  the Summit Cr. data set, (Step 1)

  1. right-click on a link to a data set on a web page, like this one:  [sumcr.csv]
  2. then save the file (using Internet Explorer, click on "Save target as..." or for Firefox, click on "Save link as...", or using Safari on the Mac, click on "Download Linked Files As…"
  3. then browse to the working folder created above, and 
  4. save the file.

To read the Summit Cr. data set into R (Step 2), type the following:

sumcr <- read.csv("sumcr.csv")

NOTE:  This will only work if the file was downloaded to the working folder.  If you saved it somewhere else, like your Downloads  folder, you should move it into your working folder.  On the Mac, you may have to change the working folder using the Misc > Change Working Directory… menu command.  On both a PC or a Mac, you can verify that the file is in the right place by typing

dir()

NOTE:  Punctuation, spelling and case are important.  R is case sensitive; in other words, Sumcr is not the same thing as sumcr, and Read.csv is not the same as read.csv.

The read.csv() function creates a data frame "object" called "sumcr" that contains the data from the .csv file.  Note that the data frame object doesn't need to have the same name as the file, but by convention it usually does.  The "<-" arrow is called the "assignment operator", which, as it sounds, assigns whatever object is to its right to whatever object is to its left, sometimes creating a new object in the process.  In reading a line of text, the operator is usually spoken as "gets" as in "the dataframe sumcr gets the contents of the sumcr.csv file."  In newer versions of R, the equals (=) sign can be used, but in most existing texts and .pdf files, the <- version is used.

The advantage of this approach is that you have an Excel-editable copy of the data set in your working folder.

An alternative approach is to use the file.choose() function to browse to a particular file:

sumcr <- read.csv(file.choose())

This will open an "Select file..." dialog box.

Looking at the data

The first thing to do is to check to see that R indeed has the Summit Cr. data frame in its workspace.  This can be done by typing ls() (the list function) at the command line, or clicking on Misc > List objects on the RGui menu. 

The data frame can be examined a couple of different ways:

  • by simply typing the name of the data frame at the command line (e.g. sumcr), or
  • by editing the data set using the built in editor.  The editor is started up by typing fix(sumcr) at the command line, or by using Edit > Data editor from the Rgui menu, and then typing in the name of the data frame in the "Question" dialog box. 

Use the close button on the editor window, or the File > Close menu to close the editor down and return to the RConsole window.

The names() function can be used to get a list of the variables in a data frame, e.g.:  names(sumcr)

The individual variables are referred to by a "compound"  name consisting of the data frame name and the variable name, joined by a dollar sign ($), e.g. sumcr$WidthWS  Note that variable names are case-sensitive too (e.g. the name sumcr$WidthWS is not the same as sumcr$widthws.)  This manner of referring to variables can be made less cumbersome by using the attach() function.  For example, try typing the following (don't type the material in parentheses, or the comments within a line, just the text in the Courier type face:

sumcr$WidthWS   (works ok)
WidthWS
   produces the error message Object "WidthWS" not found)
attach(sumcr)
, press Enter, followed by WidthWS  on the next line (works ok now).

10.  What to hand in.

Use the summary() function to produce a quick summarization of the data set:

summary(sumcr)

To print the summary out, select the text, and click on the "print" icon, or use File > Print.

[Geog. 4/517] [syllabus] [lectures & exercises] | [GeogR] [topics] [data sets] [documentation]