Visualizing high-resolution and high-dimension data sets

The techniques for the display and interactive analysis of high-resolution and high-dimension data sets is rapidly developing.  The approaches illustated here make use of the iplots and Rggobi packages.

1.  Visualizing high-resolution data sets using enhanced versions of standard displays

The display of high resolution data (in addition to the standard approach of simply mapping it) can be illustrated using a set of climate station data for the western United States, consisting of 3728 observations of 15 variables.  Although these data are not of extremely high resolution (or high dimension), they illustrate the general ideas.

Begin by loading the appropriate packages and  reading the data, along with a shape file that is used to quickly map the data.  Data: [wus_pratio.csv]     Shapefile components:  [wus.shp] [wus.dbf] [wus.shx]

library(maptools) # loads sp library too
library(gpclib)
library(RColorBrewer) # creates nice color schemes
library(classInt) # finds class intervals for continuous variables

wus.shp <- readShapeLines(file.choose(), 
    proj4string=CRS("+proj=longlat"))   # read the shapefile
wus_pratio <- read.csv(file.choose()) # read wus_pratio.csv

attach(wus_pratio)

The following code produces a standard map with the stations represented by dots:

# map of precipitation stations
plot(wus.shp)
points(lon, lat, pch=16)

In what follows, we'll want to examine the large-scale patterns of the seasonality (summer-wet vs. winter-wet) of precipitation.  The data consist of monthly precipitation ratios, or the average precipitation for a particular month and station divided by the average annual total precipitation.  This has the effect of removing the very large imprint of elevation on precipitation totals.  The ratio of July to January precipitation provides a single overall description of precipitation seasonality.

# a second map with some colors
pjulpjan <- pjulpann/pjanpann  # pann values cancel out
nclr <- 10
plotclr <- brewer.pal(nclr,"PRGn")
class <- classIntervals(pjulpjan, nclr, style="fixed",
    fixedBreaks=c(9999.0, 10.0, 5.0, 2.0, 1.25, 1.0, .800, .500, .200,
        .100, 0.0))
colcode <- findColours(class, plotclr)
plot(wus.shp)
points(lon, lat, pch=16, col=colcode, main="Jan/Jul Precipitation")
legend(locator(1), legend=names(attr(colcode, "table")),
    fill=attr(colcode, "palette"), cex=0.6, bty="n")

A simple scatter plot showing the relationship between January and July precipitation ratios illustrates how the crowding of points makes interpretation difficult.  The crowding can be overcome by plotting transparent symbols specified using the "alpha channel" of the color for individual points.

# plot January vs. July precipitation ratios

# opaque symbols
plot(pjanpann, pjulpann, pch=16, cex=1.25, col=rgb(1,0,0))

# transparent symbols
plot(pjanpann, pjulpann, pch=16, cex=1.25, col=rgb(1,0,0, .2))

# transparent symbols using the pdf() device
pdf(file="plot01.pdf")
plot(pjanpann, pjulpann, pch=16, cex=1.25, col=rgb(1,0,0, .2))
dev.off()

It's easy to see how the transparency of the symbols provides a visual measure of the density of points in the various regions in the space represented by the scatter plot.  A set of stripcharts illustrates the same principle.

# stripcharts -- opaque symbols
stripchart(pjanpann, xlab="PJan/Pann", method="overplot", pch=15, col=rgb(0,0,0))
stripchart(pjanpann, xlab="PJan/Pann", method="stack", pch=15, col=rgb(0,0,0))

# stripcharts -- alpha-channel transparency
stripchart(pjanpann, xlab="PJan/Pann", method="overplot", pch=15, col=rgb(0,0,0,0.1))
stripchart(pjanpann, xlab="PJan/Pann", method="stack", pch=15, col=rgb(0,0,0,0.1))

Over the region as a whole, the interesting question is the role that elevation may play in the seasonality of precipitation.

# seasonal precipitation vs. elevation
plot(elev, pjanpann, pch=16, col=rgb(0,0,1, 0.1))
plot(elev, pjulpann, pch=16, col=rgb(0,0,1, 0.1))
plot(elev, pjulpjan, pch=16, col=rgb(0,0,1, 0.1))

detach(wus_pratio)

The iplots package provides another way of producing "alpha-channel" plots (plus more).

[Note:  on the Mac, the preferred way to use the iplot  library is to  run the JGR GUI for R.  see http://rosuda.org/JGR/down.shtml).  The .dmg installs a GUI that works like the one built into R, except that commands are entered in the bottom half of the main window instead of at the command prompt.]

# iplots version
library(rJava)
library(iplots)
library(grDevices) # load this additional library on the Mac
attach(wus_pratio)

iplot(elev,pjanpann) # use arrow keys to control transp. and symbol size

detach(wus_pratio)

2.  Linked interactive plots and "brushing"

Linked plots are those in which multiple plots are generated and viewed, and particular observations are flagged or "brushed" to call attention to them in each of the individual plots.  The iplots package provides an easy way to create a small number of linked plots.  The data set used here (precipitation ratios in the region around Yellowstone National Park) is a subset of the larger data set, and facilitates a more rapid demonstration.  The function ipcp() creates a parallel-coordinates plot, while the functioniplot() in this instance creates a crude map. After the plots are created, the mouse can be used to draw a rectangular selection region that "lights up" points in both plots.  [yell_pratio.csv]

# linked interactive plots -- brushing
# example with Yellowstone region pratios
yellpratio <- read.csv(file.choose()) # read yell_pratio.csv
attach(yellpratio)
summary(yellpratio)
ipcp(yellpratio) # use arrow keys to control transparency
iplot(Lon, Lat) # use arrow keys to control symbol size

detach(yellpratio)

# example with the higher-density western U.S. pratios
attach(wus_pratio)
ipcp(wus_pratio) # use arrow keys to control transparency
iplot(lon, lat) # use arrow keys to control symbol size

detach(wus_pratio)

Data sets that contain a mixture of categorical or factor variables and ordinal or ratio-scale "continuous" variables can also be visualized, as illustrated by the Summit Cr. data set.

# linked plots -- Summit Cr. data set
attach(sumcr)

# linked categorical plots
imosaic(data.frame(Reach,HU))
ibar(HU)
ibar(Reach)

# add a scatter plot
iplot(CumLen, WidthWS)
iplot.opt(col=unclass(Reach)+3)

# add a boxplot
ibox(WidthWS, Reach)

3.  Brushing and spinning

The rggobi package provides another approach for creating interactive linked plots.  The rggobi package connects R with a stand-alone program called GGobi:  http://www.ggobi.org/rggobi/ Loading the rggobi library starts GGobi.

# brushing and spinning
library(rggobi)
ggobi(sumcr)

To illustrate the use of the rggobi package for bushing a set of linked plots, when the initial display appears after invoking the ggobi() function, do the following:

  1. set the scatter plot variables as CumLen (X), and WidthWS (Y) to create a familiar plot
  2. create a new display: Scatterplot Matrix with variables CumLen, DepthWS, WidthWS, WidthBF, and wsgrad selected, which will create a scatterplot matrix
  3. create a new display: Parallel Coordinates Display, with the same variables, plus HU and Reach selected
  4. select an Interaction: brush

Brush the different plots with the mouse.

The concept of "spinning" can be illustrated as follows:

  1. create a new view: Rotation, with the following variables selected:  CumLen, DepthWS, WidthWS
  2. use the mouse, or allow the automatic spinning to proceed. 

There are a number of different features of the visualizations that can be adjusted on the fly.