Geography 414/514:  Advanced Geographic Data Analysis
Spring 2008

Exercise 4:  Multivariate Plots, Coplots, and Lattice Graphics
Finish by Thursday, 1 May

1.  Introduction 

Read through the exercise before attempting to complete it.

The purpose of this exercise is to explore some of the general multivariate plotting procedures available in R, including maps, mutivariate plots in particular, conditioning plots (coplots), and Trellis/Lattice grapics. 

This exercise uses a number of "R packages" or libraries of functions, data sets, etc. that must be downloaded and installed from "CRAN" (you will need to be connected to the Internet to do this).  In the Windows R Gui, there is a menu choice "Packages" that assists in downloading and installing packages, (see Packages > Install package(s) from CRAN), and there is a similar feature on the Mac.

For this exercise, you'll need to install the following packages:  sp, maptools, ClassInt, RColorBrewer and lattice.

You can check to see if a package has been successfully downloaded and installed by attempting to load the package with the library() function, e.g.

library(maptools)

If an error message is produced e.g. Error in library(maptools) : There is no package called 'maptools') then the download and installation has failed.  If that's the case, packages may also be downloaded and installed using the command line in the R Gui, as follows:

options(CRAN = "http://cran.us.r-project.org/") # tell R where to look for packages
install.packages("maptools") # download and install the maps package

On a Mac, the documentation suggests that this is done a little differently:

options(CRAN = "http://cran.us.r-project.org/") # tell R where to look for packages
install.binaries("maptool") # download and install the maps package

You will get the following message:  --- Please select a CRAN mirror for use in this session --- and a scrolling list box should open.  It turns out that the closest repository to us is in Seattle and is the last one in the list, so scroll down and select it, and then click on "ok".  You can also use the Packages menu to chooses the closest mirror.

(You don't need to use the command line approach if you use the menu--just download the packages once.)

Occasionally, it's a good idea to check if packages have been updated; this can be done by typing.

update.packages()

or using the menu, Packages > Update packages from CRAN.

After downloading and installing, it's ok to delete the installation files when prompted.

Further information can be found in the FAQs:

http://cran.us.r-project.org/bin/windows/base/rw-FAQ.html (Windows)
http://cran.us.r-project.org/doc/FAQ/R-FAQ.html (General)
http://cran.us.r-project.org/bin/macosx/RMacOSX-FAQ.html (Mac OSX)

2.  Simple maps

Begin by constructing some simple maps of the Oregon climate-station data [orstationc.csv].  (See Section 2 of Exercise 3 for directions on how to download and save these data if they are no longer in your working directory.)  Also, before starting, download and save in your working directory the following components of a shape file of Oregon county outlines:

[orotl.shp]     [orotl.dbf]     [orotl.shx]

You will find it most helpful to use a text editor for this exercise.  TextPad (recommended, it's on the Duckware CD), Notepad or even Word will do the job.

  • map a single variable--this is accomplished by executing three blocks of R code:

# Block 1:  set up -- do this once
library(maptools)
library(sp)
library(RColorBrewer)
libary(classInt)

# read and attach climate data
orstationc <- read.csv(file.choose()) # browse to the .csv file
attach(orstationc)
# read a county outline shape file
orotl.shp <- readShapeLines(file.choose(),
     proj4string=CRS("+proj=longlat"))

Plot the outlines to check

plot(orotl.shp) # plot the outline

Now the second block:

# Block 2: set variable and get colors
plotvar <- pjan # pick a variable to plot
plottitle <- "January Precipitation"
nclr <- 8 # number of colors
plotclr <- brewer.pal(nclr,"PuBu") # get nclr colors

# find equal-frequency intervals
class <- classIntervals(plotvar, nclr, style="quantile")
colcode <- findColours(class, plotclr)
cutpts <- round(class$brks, digits=1)

Print plotvar, plotclr, colornum, brks and colorcode to get an idea of what the above block of code does.

# Block 3: plot the shape file and the selected variable
plot(orotl.shp)

# add points
points(lon, lat, pch=16, col=colcode, cex=2)
points(lon, lat, cex=2) # draws black line around each point

# add legend
legend(x=-118, y=43, legend=names(attr(colcode, "table")),
    fill=attr(colcode, "palette"), cex=0.6, bty="n")

# add the title
title(plottitle)

The first block of code should be executed once.  The three library() function calls load the sp, maptools, RColorBrewer package .  The readShapeLines() function from maptools reads in the shape file.

The second block of code sets (a) the variable to plot by assigning, in this case, pann, to plotvar  (remember you can discover the names of the variables in the data frame orstationc using the names() function), and (b) the number of colors to use.  (To plot other variables, replace pjan in this example.)  The remainder of this second block figures out which particular color to assign to each observation.

The third block of code plots the shapefile, adds color-coded points, and a legend.  To generate a map of an individual variable, one would edit the second block of code, and then execute the second and third blocks, by cutting or pasting.

Construct maps for several of the climate variables (pjan and tjul in particular).

Q1:  Describe the basic patterns and gradients of winter precipitation and summer temperature across the state.

3.  The coplot

Coplots (or conditioning plots) are probably the most commonly used multipanel plot, and an easy-to-use function is available.  The following code creates a coplot for annual temperature, plotted as a function of elevation, given latitude and longitude:

library(lattice)
attach(orstationc)
coplot(tann ~ elev | lon * lat, number=5, overlap=.5,
    panel=function(x,y,...) {
        panel.smooth(x,y,span=.8,iter=5,...)
        abline(lm(y ~ x), col="blue")
    } )

Each panel on the diagram shows the relationship between annual average temperature and elevation for a geographical subset of data.  The "panel functions" in coplot() allow lowess curves and least-squares lines to be added to the plot to facilitate interpretation.  The individual panels, arranged as they are here, form a map of scatter diagrams.

Q2:  How do latitude and longitude influence the relationship between temperature and elevation?  Is the relationship "stationary" across the state (i.e. same relationship everywhere), or are there spatial variations in the strength or form of the relationship.

Experiment with the number of coplot shingles and degree of overlap of each (the number and overlap arguements in the coplot() function.

Q3:  How do these two parameters control the appearance of the resulting diagram?  Is there a particular choice that seems to work better?

4.  Lattice plots

This set of plots and analyses use a data set of glacial-cirque locations in Oregon collected by Deb Sea several years ago for a class project.  The data set may be found here:  [cirques.csv] and the data can be read into the R workspace after downloading as follows:

cirques <- read.csv(file.choose())
attach(cirques)
names(cirques)

The variables are an index number (Cirque), location (Lat, Lon), elevation in meters (Elev), a region indicator (which can be plotted to discover what the regions are; Region) and a 0/1 variable that indicates whether each cirque is occupied by a glacier (Glacier = 1) or not.

(Should you wish to also do nicer maps of these data, here are the links to the appropriate shapefile components:  [.dbf] [.shp] [.shx])

The aim of the analysis is to examine the spatial variations in the elevations of the cirque basins, in order to infer what might be controlling their distribution.  From theory, we might expect that the "glaciation threshold" or elevation at which glaciers may form, should be lower where it is cooler and moister, and higher where it is warm and dry.  The question that can be asked here is whether the Oregon cirque distributions conform to this idea, based on examining that distribution. 

  • Load the Lattice package and change the default colors:

library(lattice)
trellis.device(color=TRUE, theme = "col.whitebg") 

(A gray-background graphics window will open, but nothing else happens.) 

  • Produce a 3-D scatter plot of cirque elevation as a function of latitude and longitude, with the points colored by Glacier

cloud(Elev ~ Lon*Lat, pch=16, cex=1.25, col=1+as.integer(Glacier))

  • Create and plot some conditioning variables:

Lat2 <- equal.count(Lat,4,.5)
Lon2 <- equal.count(Lon,4,.5)
plot(Lat2)
plot(Lon2)

  • Using those conditioning variables, construct a Lattice-type coplot of elevation as a function of longitude, given (or condtioned by) latitude:

# Cirque elevation as a function of longitude, given latitude
plot2 <- xyplot(Elev ~ Lon | Lat2,
    layout = c(4, 2),
    panel = function(x, y) {
        panel.grid(v=2)
        panel.xyplot(x, y)
        panel.loess(x, y, span = .8, degree = 1, family="gaussian")
        panel.abline(lm(y~x))
    },
    xlab = "Longitude",
    ylab = "Elevation (m)")
print(plot2)

Note that the two lines/curves added to the plot (a lowess/loess curve and a straight-line plot) are meant to summarize the relationship within each scatter plot panel, if there is a relationship to be summarized.  In other words, the curves may not appear to fit the data very well at all, which is useful to know.  In other cases, the curves may help to reveal a relationship that would otherwise be difficult to see.  You can eliminate the curves by removing the panel functions
panel.xyplot(etc.) and panel.abline(etc.)

  • Edit the above code, to create a coplot for elevation as a function of  latitude, conditioned by longitude.  The most convenient way to do this is to copy and past the block of code into a text editor, make the changes, and the copy and paste the result into R.

Q4:  Describe the relationships.  (How do cirque elevations vary across the state?  Do the relationships conform to the conceptual model described above?)

  • Produce two other kinds of Trellis plots

# Lattice stripplot
plot4 <- stripplot(Glacier ~ Elev | Lat2*Lon2)
print(plot4)

# Lattice boxplot
plot5 <- bwplot(Glacier ~ Elev| Lat2*Lon2)
print(plot5)

Note that on these plots, the unglaciated cirque basins (Glacier = 0) in the data set will be labelled "1", while the glaciated basins (Glacier = 1) in the data set, will be labelled "2" (i.e. the first and second levels of the variable Glacier).

Q5:  Do the additional plots contribute anything to understanding the distribution of cirque elevations? 

Q6:  Describe the distribution of cirques across the state, and suggest what might influence that distribution.  It might make sense to take a look at some maps of the climate variables.