|
Geography 414/514: Advanced Geographic Data Analysis Exercise 1: Getting and using R 1. Introduction The object of this exercise is to install and set up R, and to experiment with some basic procedures. R is actually a computer language (that is quite similar to the S language for data analysis and visualization developed at AT&T's Bell Labs), but is best thought of as an "environment" for producing both numerical and graphical analyses of data. R has several advantages for us here, because
R has a fairly steep learning curve, which these exercises are designed to diminish. The home page for the "R project" is at http://www.r-project.org Read through the following before beginning... 2. Getting R R can be downloaded from one of the "CRAN" (Comprehensive R Archive Network) sites. In the US, the main site is at http://cran.us.r-project.org/ To download R,
There are three "FAQ" pages that contain additional information that may be useful for working out the kinks. These include
3. Set Up (for Windows) It will be useful while running R on Windows to create one (or more) "working folders" that R can use to store its internal workspace (which will appear in that folder as a file named .Rdata), and into which you can download or create data sets (e.g. in Excel or ArcGIS), or files containing R "source code" (e.g. using a text editor like TextPad). Once that folder is created, then a shortcut (icon) on your desktop can be created that points to that working folder while starting up R. To create a working folder,
To create the desktop shortcut,
If the shortcut has been properly created, you can use it to start R, and it will automatically assume its working folder is the one you created. Other shortcuts and working folders can be created. 4. Starting R To start the R "gui" (graphical user interface), just click on the shortcut you just created. After a brief pause, you shoud see the message: R version 2.6.2 (2008-02-08) Copyright (C) 2008 The R Foundation for Statistical Computing ...[Previously saved workspace restored] appear in the "RConsole" window. You can verify that R is looking at the correct folder by clicking on File > Change dir... on the RGui menu. If you're in the folder you just created, fine, otherwise you could browse to it here. See pages 3-4 in Maindonald, Using R for Data Analysis... for a description of what the various menus and windows in the R GUI do. The command window (or RConsole) is where you type commands and view text (as opposed to graphics) results. The prompt is the character ">" (in red usually) at the bottom of the text in the R Console window. If you've scrolled away from the prompt, typing anything in the window will bounce you back. Most of the time when using R, you'll also want to use a text editor (e.g. TextPad, but Notepad or even Word will work), so you may want to start that first. RGui has a built-in script editor too, which can also be used to edit files. 5. Installing Packages R comes with a number of add-on packages that are installed when you install R. Future exercises will use a number of "R packages" or libraries of functions, data sets, etc. that must be downloaded and installed from "CRAN" (you will need to be connected to the Internet to do this), and it would be handy to install them now. In the Windows R Gui, there is a menu choice "Packages" that assists in downloading and installing packages, (see Packages > Install package(s) from CRAN), and there is a similar feature on the Mac. You will get the following message: --- Please select a CRAN mirror for use in this session --- and a scrolling list box should open. It turns out that the closest repository to us is in Seattle and is the last one in the list, so scroll down and select it, and then click on "ok". You can also use the Packages menu to chooses the closest mirror. When the scrolling list box appears with package names in it, For the next few exercises, you'll need to install the following packages: sp, maptools, rgdal, maps, mapproj, mapdata, classInt, scatterplot3d, and RColorBrewer. You can check to see if a package has been successfully downloaded and installed by attempting to load the package with the library() function, e.g.
If an error message is produced e.g. Error in library(maptools) : There is no package called 'maptools') then the download and installation has failed. If that's the case, packages may also be downloaded and installed using the command line in the R Gui, as follows: options(CRAN =
"http://cran.us.r-project.org/") # tell R where to look for packages On a Mac, the documentation suggests that this is done a little differently: options(CRAN =
"http://cran.us.r-project.org/") # tell R where to look for packages (You don't need to use the command line approach if you use the menu--just download the packages once.) Occasionally, it's a good idea to check if packages have been updated; this can be done by typing.
or using the menu, Packages > Update packages from CRAN. 6. Quitting R There are several ways to quit R -- clicking on the "close window" button, typing File > Exit from the RGui menu, or typing quit() at the command prompt (or more simply q()). (Note that you must type the parentheses.) R will ask if you want to save the current workspace image. In general, you'll want to do that, but there are cases when you might not want to (e.g. you've accidentally deleted some intermediate results). 7. Getting Help The first thing to do in learning new software is figure out how to get help. R has several approaches:
8. A Data Set The Summit Cr. geomorphic data consists of 88 observations of 11 variables along an 0.8-km stretch of Summit Cr. in eastern Oregon. This data set was collected by Pat McDowell, Frank Magilligan and their students as part of their study of the effects of cattle "exclosures" on the morphology of stream channels. They divided this stretch of Summit Cr. into individual "hydrologic units" (HU's) that were either pools, shallow "riffles," or straight "glides." The overall study area is divided into three sections: an upstream reach (reach A) in which cattle are permitted to graze, a middle reach (reach B) from which cattle have been excluded, and a downstream reach (reach C), in which cattle were again permitted to graze. The dataset contains the following information:
The above table is sometimes referred to as a "codebook" that provides an expanded definition for each variable. (There is a tradeoff between shortish variable names, which are efficient to type, and longish variable names that are more self-explanatory.) 9. Importing a Data Set Reading data R can read data from a number of different sources, including text (ascii) data and the .csv (comma separated values) format of Excel spreadsheets, as well as from an internal format, which is text-based, but not easily readable by humans. R stores the data, names of variables, etc. in an efficient form in its workspace (.Rdata) that can be saved and reloaded. At the time of this writing, the most efficient way to open and import a new data set is in .csv format, which can be download from a web page, either the "data sets" page on the course web page, or from a link on one of the exercise pages like this one. Importing a data set or shape file into R is a two-step procedure: 1) getting or downloading the data set from a server onto the computer you're using, and 2) reading into R. To download the Summit Cr. data set, (Step 1)
To read the Summit Cr. data set into R (Step 2), type the following:
NOTE: Punctuation, spelling and case are important. R is case sensitive; in other words, Sumcr is not the same thing as sumcr, and Read.csv is not the same as read.csv. The read.csv() creates a data frame "object" called "sumcr" that contains the data from the .csv file. Note that the data frame object doesn't need to have the same name as the file, but by convention it usually does. The "<-" arrow is called the "assignment operator", which, as it sounds, assigns whatever object is to its right to whatever object is to its left, sometimes creating a new object in the process. In reading a line of text, the operator is usually spoken as "gets" as in "the dataframe sumcr gets the contents of the sumcr.csv file." In newer versions of R, the equals (=) sign can be used, but in most existing texts and .pdf files, the <- version is used. The advantage of this approach is that you have an Excel-editable copy of the data set in your working folder. An alternative approach is to use the file.choose() function to browse to a particular file:
This will open an "Select file..." dialog box. Looking at the data The first thing to do is to check to see that R indeed has the Summit Cr. data frame in its workspace. This can be done by typing ls() (the list function) at the command line, or clicking on Misc > List objects on the RGui menu. The data frame can be examined a couple of different ways:
Use the close button on the editor window, or the File > Close menu to close the editor down and return to the RConsole window. The names() function can be used to get a list of the variables in a data frame, e.g.: names(sumcr) The individual variables are referred to by a "compound" name consisting of the data frame name and the variable name, joined by a dollar sign ($), e.g. sumcr$WidthWS Note that variable names are case-sensitive too (e.g. the name sumcr$WidthWS is not the same as sumcr$widthws.) This manner of referring to variables can be made less cumbersome by using the attach() function. For example, try typing the following (don't type the material in parentheses, or the comments within a line, just the text in the Courier type face: sumcr$WidthWS (works ok) 10. What to hand in. Use the summary() function to produce a quick summarization of the data set:
To print the summary out, select the text, and click on the "print" icon, or use File > Print.
|
||||||||||||||||||||||||||||||||||||||||||||||||
| [Geog. 414/514] [syllabus] [lectures & exercises] | [GeogR] [topics] [data sets] [documentation] |