|
Geography 4/517: Geographic Data Analysis Exercise 1: Getting and using R 1. Introduction The object of this exercise is to install and set up R, and to experiment with some basic procedures. R is actually a computer language (that is quite similar to the S language for data analysis and visualization developed at AT&T's Bell Labs), but is best thought of as an "environment" for producing both numerical and graphical analyses of data. R has several advantages for us here, because
R has a fairly steep learning curve, which these exercises are designed to diminish. The home page for the "R project" is at http://www.r-project.org Read through the following before beginning... 2. Getting R R can be downloaded from one of the "CRAN" (Comprehensive R Archive Network) sites. In the US, the main site is at http://cran.us.r-project.org/ To download R,
Windows (7 and Vista, XP is basically similar)
Mac OS X
There are three "FAQ" pages that contain additional information that may be useful for working out the kinks. These include
3. Set Up Windows (7 and Vista) It will be useful while running R on Windows to create one (or more) "working folders" that R can use to store its internal workspace (which will appear in that folder as a file named .Rdata), and into which you can download or create data sets (e.g. in Excel or ArcGIS), or files containing R "source code" or scripts (e.g. using a text editor like the built-in script editor in R). Once that folder is created, then a shortcut (icon) on your desktop can be created that points to that working folder while starting up R. On Windows, the possibility exists to create a number of working folders, wherein the data for specific projects can be conveniently stored. For this class, one folder will probably do the job. To create a working folder,
The new folder will be empty, but should look something like this: working-folder To create the desktop shortcut,
The Shortcut tab should look something like this: shortcut-tab. If the shortcut has been properly created, you can click on it to start R, and it will automatically assume that its working folder is the one you created. Other shortcuts and working folders can be created. Mac OS X The Mac version of R has a built-in Workspace browser, which makes the maintenance of separate workspaces straightforward, and so it is unnecessary to creat a desktop shortcut. To create a working folder, the procedure is similar to that on Windows 1. start Finder 2. click on File > New Folder, and create a new folder in your User/Documents folder and name this geog417 3. create another new folder within that folder named work01. The last folder created should be named, for example, /User/Documents/geog417/work01, where "User" is your user name. The R.app GUI can be added to the Dock for convenience in starting up R. 4. Starting R To start the R "gui" (graphical user interface), just click on the shortcut you just created (in Windows) or on the R.app GUI (Mac) in the Applications folder. After a brief pause, you should see the message: R version 2.12.2 (2011-02-25) Copyright (C) 2011 The R Foundation for Statistical Computing … [Previously saved workspace restored] appear in the "RConsole" window. In Windows, you can verify that R is looking at the correct folder by clicking on File > Change dir... on the RGui menu. If you're in the folder you just created, fine, otherwise you could browse to it here. See pages 3-4 in Maindonald, Using R for Data Analysis... for a description of what the various menus and windows in the Windows version of the R GUI do. On the Mac, you will probably be in a default folder. You can use the Misc > Change Working Directory… menu command to browse to the folder created above. The command window (or RConsole) is where you type commands and view text (as opposed to graphics) results. The prompt is the character ">" (in red, usually) at the bottom of the text in the R Console window. If you've scrolled away from the prompt, typing anything in the window will bounce you back. Most of the time when using R, you'll also want to use a word processor (e.g. Word), and a text editor like Notepad (Windows) or TextEdit (Mac), so you may want to start them too. RGui on Windows has a built-in script editor too, which can be used to edit files. 5. Installing Packages R comes with a number of add-on packages that are installed when you install R. Future exercises will use a number of "R packages" or libraries of functions, data sets, etc. that must be downloaded and installed from "CRAN" (you will need to be connected to the Internet to do this), and it would be handy to install them now. Windows 7 and Vista -- Important! Because Windows 7 and Vista object to the idea of programs installing files into the C:\Program Files folder, R will run into trouble when it attempts to install add-in packages there. The most reliable work-around seems to be to create what's known as a personal library where the packages are stored. R will gernally offer to create one the first time you download a package, but Windows sometimes does not get the permissions entirely correct. The best thing to do is to create the folder yourself before downloading a package the first time. e.g., create the folder C:\Users\bartlein\Documents\R\win-library\2.12\ (Later, when you update R to a new version (e.g. R 2.13.0) you can create another new personal library, move the old packages there, and use R to update them.) Windows In the Windows R Gui, there is a menu choice "Packages" that assists in downloading and installing packages, (see Packages > Install package(s) from CRAN), and there is a similar feature on the Mac. You will get the following message: --- Please select a CRAN mirror for use in this session --- and a scrolling list box should open. It turns out that the closest repository to us is in Seattle and is the last one in the list, so scroll down and select it, and then click on "ok". You can also use the Packages menu to chooses the closest mirror. When the scrolling list box appears with package names in it, click on the package(s) you want to install Mac OS X On the R.app GUI there is a menu choice called "Packages & Data" The R Package Installer menu choice brings up a dialog that can be populated with packages by clicking "Get List". Select the name of the package, and click "Install Selected". When the package is installed, its version number will appear in the listing. Packages used in the course For the next few exercises, you'll need to install the following packages: sp, maptools, rgdal, maps, mapproj, mapdata, classInt, scatterplot3d, and RColorBrewer. Note: At the moment, the rgdal package for the Mac has to be compiled from source code (meaning that it’s not an automatically working download), so if you’re a Mac user, skip it downloading it for now. You can check to see if a package has been successfully downloaded and installed by attempting to load the package with the library() function, e.g.
If an error message is produced e.g. Error in library(maptools) : There is no package called 'maptools') then the download and installation has failed. If that's the case, packages may also be downloaded and installed using the command line in the R Gui, as follows: options(CRAN =
"http://cran.us.r-project.org/") # tell R where to look for
packages On a Mac, the documentation suggests that this is done a little differently: options(CRAN =
"http://cran.us.r-project.org/") # tell R where to look for
packages (You don't need to use the command line approach if you use the menu--just download the packages once.) Because there are over 2000 packages at CRAN now, scrolling through the list can get tedious. You can also install packages within R using the command line: install.packages("sp") If you haven’t selected a CRAN mirror in the current session, R will prompt you for one. Occasionally, it's a good idea to check if packages have been updated; this can be done by typing.
or, on Windows, using the menu, Packages > Update packages from CRAN, or on the Mac clicking "Update All" on the R Package Installer. Windows 7 and Vista may complain (within R) when you try to update packages. That's what is going on if you see the following error message after typing update.packages(), which indicates that the packages were not successfully updated. C:\Users\bartlein\AppData\Local\Temp\RtmpBsLdAb\downloaded_packages The work-around is to start R as an “Administrator” (right-click on the R icon, and click on “Run as Adminstrator” before updating packages. Here's what the dialog should look like: run-as-admin 6. Quitting R There are several ways to quit R -- clicking on the "close window" button, typing File > Exit from the RGui menu on Windows, typing R > Quit R on the RGui menu on the Mac (or clicking on the power switch, or typing quit() at the command prompt (or more simply q()). (Note that you must type the parentheses.) R will ask if you want to save the current workspace image. In general, you'll want to do that, but there are cases when you might not want to (e.g. you've accidentally deleted some intermediate results). 7. Getting Help The first thing to do in learning new software is figure out how to get help. R has several approaches:
8. A Data Set The Summit Cr. geomorphic data consists of 88 observations of 11 variables along an 0.8-km stretch of Summit Cr. in eastern Oregon. This data set was collected by Pat McDowell, Frank Magilligan and their students as part of their study of the effects of cattle "exclosures" on the morphology of stream channels. They divided this stretch of Summit Cr. into individual "hydrologic units" (HU's) that were either pools, shallow "riffles," or straight "glides." The overall study area is divided into three sections: an upstream reach (reach A) in which cattle are permitted to graze, a middle reach (reach B) from which cattle have been excluded, and a downstream reach (reach C), in which cattle were again permitted to graze. The dataset contains the following information:
The above table is sometimes referred to as a "codebook" that provides an expanded definition for each variable. (There is a tradeoff between shortish variable names, which are efficient to type, and longish variable names that are more self-explanatory.) 9. Importing a Data Set Reading data R can read data from a number of different sources, including text (ascii) data and the .csv (comma separated values) format of Excel spreadsheets, as well as from an internal format, which is text-based, but not easily readable by humans. R stores the data, names of variables, etc. in an efficient form in its workspace (.Rdata) that can be saved and reloaded. At the time of this writing, the most efficient way to open and import a new data set is in .csv format, which can be download from a web page, either the "data sets" page on the course web page, or from a link on one of the exercise pages like this one. Importing a data set or shape file into R is a two-step procedure: 1) getting or downloading the data set from a server onto the computer you're using, and 2) reading into R. To download the Summit Cr. data set, (Step 1)
To read the Summit Cr. data set into R (Step 2), type the following:
NOTE: This will only work if the file was downloaded to the working folder. If you saved it somewhere else, like your Downloads folder, you should move it into your working folder. On the Mac, you may have to change the working folder using the Misc > Change Working Directory… menu command. On both a PC or a Mac, you can verify that the file is in the right place by typing dir() NOTE: Punctuation, spelling and case are important. R is case sensitive; in other words, Sumcr is not the same thing as sumcr, and Read.csv is not the same as read.csv. The read.csv() function creates a data frame "object" called "sumcr" that contains the data from the .csv file. Note that the data frame object doesn't need to have the same name as the file, but by convention it usually does. The "<-" arrow is called the "assignment operator", which, as it sounds, assigns whatever object is to its right to whatever object is to its left, sometimes creating a new object in the process. In reading a line of text, the operator is usually spoken as "gets" as in "the dataframe sumcr gets the contents of the sumcr.csv file." In newer versions of R, the equals (=) sign can be used, but in most existing texts and .pdf files, the <- version is used. The advantage of this approach is that you have an Excel-editable copy of the data set in your working folder. An alternative approach is to use the file.choose() function to browse to a particular file:
This will open an "Select file..." dialog box. Looking at the data The first thing to do is to check to see that R indeed has the Summit Cr. data frame in its workspace. This can be done by typing ls() (the list function) at the command line, or clicking on Misc > List objects on the RGui menu. The data frame can be examined a couple of different ways:
Use the close button on the editor window, or the File > Close menu to close the editor down and return to the RConsole window. The names() function can be used to get a list of the variables in a data frame, e.g.: names(sumcr) The individual variables are referred to by a "compound" name consisting of the data frame name and the variable name, joined by a dollar sign ($), e.g. sumcr$WidthWS Note that variable names are case-sensitive too (e.g. the name sumcr$WidthWS is not the same as sumcr$widthws.) This manner of referring to variables can be made less cumbersome by using the attach() function. For example, try typing the following (don't type the material in parentheses, or the comments within a line, just the text in the Courier type face:
10. What to hand in. Use the summary() function to produce a quick summarization of the data set:
To print the summary out, select the text, and click on the "print" icon, or use File > Print. |
||||||||||||||||||||||||||||||||||||||||||||||||
|
[Geog. 4/517] [syllabus] [lectures & exercises] | [GeogR] [topics] [data sets] [documentation] |