Maps in R – Examples  (Part 1)

 

This web page of examples has two parts.  This part describes some basic maps that make use of the sp class  (a framework for managing data sets that have explicit spatial coordinates) and the maptools) packages, while the second part describes more advanced procedures like map projection.

 

As in the exercises and other lectures, R code and output will be shown in this font and the prompt in R (“>”) will not be shown.

 

The examples here use several libraries, datasets, and shapefiles that should be downloaded and/or installed, and read in before proceeding, if you want to try to reproduce the examples below.  This will make the examples easier to follow, but they will fail if the data sets do not exist or have not been read in.  For one of the example data sets, Oregon climate-station data, the data are available in two forms--as a .csv file (orstation.csv), and included as part of the orstations shape file.  as a consequence, the individual variables will be referred to by their "full" names (e.g. orstationc$tann).  There are also several libraries that are required.

 

Required libraries:

 

maptools (and sp), rgdal, RColorBrewer, classInt, maps

 

Shapefiles:

 

base name

 

shapefile components

Oregon counties and census data

orcounty

 

[.dbf] [.sbn] [.sbx] [.shp] [.shx]

Oregon county outlines only

orotl

 

[.dbf] [.shp] [.shx]

Oregon climate station data

orstations

 

[.dbf] [.sbn] [.sbx] [.shp] [.shx]

 

When downloading the shape files to your working directory, you should download all of the individual components (i.e. .dbf, .sbn, etc.).

 

Data sets:

 

[orstationc.csv] -- Oregon climate station data (includes lat and lon)
[orcountyp.csv] -- Oregon county census data (includes lats and lons, but not outlines)
[cities.csv] -- Large cities data set

 

Load the libraries:

 

library(maptools)     # loads sp library too
library(RColorBrewer) # creates nice color schemes
library(classInt)     # finds class intervals for continuous variables

 

Read the shapefiles using the maptools function read.shape()

 

# outlines of Oregon counties (lines)
# browse to orotl.shp
orotl.shp <- readShapeLines(file.choose(),
     proj4string=CRS("+proj=longlat"))

 

The readShapeLines() function has two arguments, one identifying the file, and the other the particular projection that the lines, points and polygons in the file are in.  If they’re not projected (i.e. in they are just in latitude or longitude), then the projection label is just +proj=longlat.  Note the use of the file.choose() function here to browse to the file.  The other shapefiles are read the same way, but notice that different functions are used to read shapefiles that consist of points or polygons (readShapePoints(),readShapePoly()).

 

# Oregon climate station data (points)
# browse to orstations.shp
orstations.shp <- readShapePoints(file.choose(),
     proj4string=CRS("+proj=longlat"))

# Oregon county census data (polygons)
# browse to orcounty.shp
orcounty.shp <- readShapePoly(file.choose(),
     proj4string=CRS("+proj=longlat"))

 

Note that for each shapefile, you only need to read the .shp component—the others will be read in at the same time.

 

Read in the ordinary rectangular data sets:

 

orstationc <- read.csv("orstationc.csv")
orcountyp <- read.csv("orcountyp.csv")
cities <- read.csv("cities.csv")

 

It’s a good idea to examine the structure and contents of orcounty.shp shapefile:  Typing

 

summary(orcounty.shp)

 

produces a summary that indicates what kind of shapefile it is (polygons in this case), minimum and maximum values of the coordinates, and then some summary() function type summaries:

 

Object of class SpatialPolygonsDataFrame

Coordinates:

          min        max

r1 -124.55840 -116.46944

r2   41.98779   46.23626

Is projected: NA

proj4string : [+proj=longlat]

Data attributes:

        NAME     STATE_NAME STATE_FIPS   CNTY_FIPS       FIPS  

 Baker    : 1   Oregon:36   41:36      001    : 1   41001  : 1 

 Benton   : 1                          003    : 1   41003  : 1 

 Clackamas: 1                          005    : 1   41005  : 1 

 Clatsop  : 1                          007    : 1   41007  : 1 

 Columbia : 1                          009    : 1   41009  : 1 

 Coos     : 1                          011    : 1   41011  : 1 

 (Other)  :30                          (Other):30   (Other):30 

      AREA            POP1990          POP1997     

 Min.   :  481.4   Min.   :  1396   Min.   :  1691 

 1st Qu.:  960.3   1st Qu.: 14002   1st Qu.: 16640 

 Median : 1841.2   Median : 35429   Median : 39466 

 Mean   : 2696.4   Mean   : 78953   Mean   : 90151 

 3rd Qu.: 3098.1   3rd Qu.: 71848   3rd Qu.: 85201 

 Max.   :10246.0   Max.   :583887   Max.   :629165 

 

etc.

 

The attributes() function:

 

attributes(orcounty.shp)

 

provides even more information about things like the “bounding box” of the shape file (the maximum and minimum x- and y- dimensions), and it also lists out the polygons and their vertices:

 

$bbox

          min        max

r1 -124.55840 -116.46944

r2   41.98779   46.23626

 

$proj4string

CRS arguments: +proj=longlat

 

$polygons

$polygons[[1]]

An object of class “Polygons”

Slot "Polygons":

[[1]]

An object of class “Polygon”

Slot "labpt":

[1] -123.65409   45.97719

 

Slot "area":

[1] 0.2455414

 

Slot "hole":

[1] FALSE

 

Slot "ringDir":

[1] 1

 

Slot "coords":

           [,1]     [,2]

 [1,] -123.9754 45.77565

 [2,] -123.9550 45.87121

 [3,] -123.9953 45.94209

 

etc.

 

 

Another version of the attributes() function provides information about the attached data (or attributes in the ArcGIS sense).  Note the use of the “@” sign to indicate what part of the shape file we want to examine:

 

attributes(orcounty.shp@data)

 

$names

 [1] "NAME"       "STATE_NAME" "STATE_FIPS" "CNTY_FIPS"

 [5] "FIPS"       "AREA"       "POP1990"    "POP1997"  

 [9] "POP90_SQMI" "HOUSEHOLDS" "MALES"      "FEMALES"  

[13] "WHITE"      "BLACK"      "AMERI_ES"   "ASIAN_PI" 

[17] "OTHER"      "HISPANIC"   "AGE_UNDER5" "AGE_5_17" 

[21] "AGE_18_29"  "AGE_30_49"  "AGE_50_64"  "AGE_65_UP"

[25] "NEVERMARRY" "MARRIED"    "SEPARATED"  "WIDOWED"  

[29] "DIVORCED"   "HSEHLD_1_M" "HSEHLD_1_F" "MARHH_CHD"

[33] "MARHH_NO_C" "MHH_CHILD"  "FHH_CHILD"  "HSE_UNITS"

[37] "VACANT"     "OWNER_OCC"  "RENTER_OCC" "MEDIAN_VAL"

[41] "MEDIANRENT" "UNITS_1DET" "UNITS_1ATT" "UNITS2"   

[45] "UNITS3_9"   "UNITS10_49" "UNITS50_UP" "MOBILEHOME"

[49] "NO_FARMS87" "AVG_SIZE87" "CROP_ACR87" "AVG_SALE87"

 

$data_types

 [1] "C" "C" "C" "C" "C" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N"

[16] "N" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N"

[31] "N" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N"

[46] "N" "N" "N" "N" "N" "N" "N"

 

$row.names

 [1] "0"  "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11"

[13] "12" "13" "14" "15" "16" "17" "18" "19" "20" "21" "22" "23"

[25] "24" "25" "26" "27" "28" "29" "30" "31" "32" "33" "34" "35"

 

$class

[1] "data.frame"

Note that the “$data_types” part of the output indicates whether individual variables are character (“C” e.g. county names) or numerical (“N” e.g. county area).

1.  Some simple maps

R has the capability of plotting some simple maps using the maptools package, which can read and plot ESRI shapefiles.  Here are a couple of examples:

·         Oregon county census data -- attribute data in the orcounty.shp shape file

# equal-frequency class intervals
plotvar <- orcounty.shp@data$POP1990
nclr <- 8
plotclr <- brewer.pal(nclr,"BuPu")
class <- classIntervals(plotvar, nclr, style="quantile")
colcode <- findColours(class, plotclr)

plot(orcounty.shp, xlim=c(-124.5, -115), ylim=c(42,47))
plot(orcounty.shp, col=colcode, add=T)
title(main="Population 1990",
    sub="Quantile (Equal-Frequency) Class Intervals")
legend(-117, 44, legend=names(attr(colcode, "table")),
    fill=attr(colcode, "palette"), cex=0.6, bty="n")

The first block of code does some set-up things, like indication what variable to plot (it’s handy to create a temporary variable (plotvar) so that in the rest of the code the variable can be more conveniently referred to), the number of class intervals to use, and what kind (quantiles here—equal frequency class intervals) using the classIntervals() function, and then some colors using the brewer.pal() function to first define the colors, and then findColours() (note the U.K. spelling) to assign them. 

 

The second block of code produces the map, with the first plot() function setting up the map with a specifica longitude and latitude window and plotting the polygons, and then second plotting them again filled with color. 

 

Here’s what the map looks like:

 

It’s a little clunky, and the placement of labels aren’t great, but it’s quick.  The legend() function at the bottom of the second block of code is what puts the legend on the maps, and the first two arguments indicate where (in the same coordinates as the map (latitude and longitude).

 

Here’s a map of the Oregon climate station data with the data coming from the orstationc.csv file, and the basemap from orotl.shp

# symbol plot -- equal-interval class intervals
plotvar <- orstationc$tann
nclr <- 8
plotclr <- brewer.pal(nclr,"PuOr")
plotclr <- plotclr[nclr:1] # reorder colors
class <- classIntervals(plotvar, nclr, style="equal")
colcode <- findColours(class, plotclr)

plot(orotl.shp, xlim=c(-124.5, -115), ylim=c(42,47))
points(orstationc$lon, orstationc$lat, pch=16, col=colcode, cex=2)
points(orstationc$lon, orstationc$lat, cex=2)
title("Oregon Climate Station Data -- Annual Temperature",
    sub="Equal-Interval Class Intervals")
legend(-117, 44, legend=names(attr(colcode, "table")),
    fill=attr(colcode, "palette"), cex=0.6, bty="n")

 

Here’s a third map with the Oregon climate station data locations and data coming from the shape file:

and the code:

# symbol plot -- equal-interval class intervals
plotvar <- orstations.shp@data$pann
nclr <- 5
plotclr <- brewer.pal(nclr,"BuPu")
class <- classIntervals(plotvar, nclr, style="fixed",
fixedBreaks=c(0,200,500,1000,2000,5000))
colcode <- findColours(class, plotclr)
orstations.pts <- orstations.shp@coords # get point data

plot(orotl.shp, xlim=c(-124.5, -115), ylim=c(42,47))
points(orstations.pts, pch=16, col=colcode, cex=2)
points(orstations.pts, cex=2)
title("Oregon Climate Station Data -- Annual Precipitation",
    sub="Fixed-Interval Class Intervals")
legend(-117, 44, legend=names(attr(colcode, "table")),
    fill=attr(colcode, "palette"), cex=0.6, bty="n")

 

Notice how the expression orstations.shp@data$pann refers to a specific variable (pann), contained in the data attribute of the shape file.  Some other things:  This map uses fixed (ahead of making the map) class intervals (fixedBreaks) and the two points() function “calls” first plot a colored dot (col=colcode), and then just a unfilled dot (in black) to put a nice line around each point to make the symbol more obvious.

 

2.  Variations in color scales and representation

This set of examples illustrates some more applications of the maptools package, and some variations in the contruction of class intervals for choropleth maps and symbolic representation of the Oregon county-level census data:

·         Oregon county census data -- equal-frequency class intervals (left-hand map below)

# equal-frequency class intervals
plotvar <- orcounty.shp@data$AREA
nclr <- 8
plotclr <- brewer.pal(nclr,"BuPu")
#plotclr <- plotclr[nclr:1] # reorder colors if appropriate
class <- classIntervals(plotvar, nclr, style="quantile")
colcode <- findColours(class, plotclr)

plot(orcounty.shp, xlim=c(-124.5, -115), ylim=c(42,47))
plot(orcounty.shp, col=colcode, add=T)
title(main="Area: Equal-Frequency Class Intervals",
sub="Quantile (Equal-Frequency) Class Intervals")
legend(-117, 44, legend=names(attr(colcode, "table")),
    fill=attr(colcode, "palette"), cex=0.6, bty="n")

·         Oregon county census data -- equal-width class intervals (right-hand map)

 

The same data are being plotted here, the difference is the way in which class intervals are defined.  Equal-frequency class intervals create a general gradient of county sizes from the small ones in the northern Willamette Valley to the larger eastern Oregon ones, while the equal-width class intervals emphasize the large differences in the size of the counties.

 

We tend to be influenced more by the simple area of an object on the page or screen than by whatever is being plotted.  For example, here’s the 1990 county population, plotted as polygons.

 

#equal-width class intervals of 1990 population
plotvar <- orcounty.shp@data$POP1990
nclr <- 8
plotclr <- brewer.pal(nclr,"BuPu")
#plotclr <- plotclr[nclr:1] # reorder colors if appropriate
class <- classIntervals(plotvar, nclr, style="equal")
colcode <- findColours(class, plotclr)

plot(orcounty.shp, xlim=c(-124.5, -115), ylim=c(42,47))
plot(orcounty.shp, col=colcode, add=T)
title(main="Population 1990",
    sub=" Equal-Width Class Intervals")
legend(-117, 44, legend=names(attr(colcode, "table")),
    fill=attr(colcode, "palette"), cex=0.6, bty="n")

 

Notice that Multnomah county, the most populated one, despite being plotted in a dark color, gets kind of dominated by the larger, less populated counties that surround it.  The eastern Oregon counties, the least populated, occupy the greatest area on the map. 

 

An alternative is to plot the county outlines to indicate location, and a symbol to indicate, in this case, population.  This approach produces a “bubble plot”:

# bubble plot equal-frequency class intervals
plotvar <- orcounty.shp@data$POP1990
nclr <- 8
plotclr <- brewer.pal(nclr,"BuPu")
#plotclr <- plotclr[nclr:1] # reorder colors if appropriate
max.symbol.size=12
min.symbol.size=1
class <- classIntervals(plotvar, nclr, style="quantile")
colcode <- findColours(class, plotclr)
symbol.size <- ((plotvar-min(plotvar))/
   (max(plotvar)-min(plotvar))*(max.symbol.size-min.symbol.size)
   +min.symbol.size)

plot(orcounty.shp, xlim=c(-124.5, -115), ylim=c(42,47))
orcounty.cntr <- coordinates(orcounty.shp)
points(orcounty.cntr, pch=16, col=colcode, cex=symbol.size)
points(orcounty.cntr, cex=symbol.size)
text(-120, 46.5, "Area: Equal-Frequency Class Intervals")
legend(locator(1), legend=names(attr(colcode, "table")),
    fill=attr(colcode, "palette"), cex=0.6, bty="n")

Here the size of each symbol is calculated using the county population using the code at the end of the first block, and then the symbol is plotted at the centroid of each county, which is located using the coordinates() function.

 

Another way to plot data that applies to polygons (counties, states, countries, etc.), is the time-honored dot map:

 

# maptools dot-density maps
# warning: this can take a little while
plottitle <- "Population 1990, each dot=1000"
orpolys <- SpatialPolygonsDataFrame(orcounty.shp, data=as.data.frame(orcounty.shp))
plotvar <- orpolys@data$POP1990/1000.0
 
dots.rand <- dotsInPolys(orpolys, as.integer(plotvar), f="random")
plot(orpolys, xlim=c(-124.5, -115), ylim=c(42,47))
plot(dots.rand, add=T, pch=19, cex=0.5, col="magenta")
plot(orpolys, add=T)
title(plottitle)
 
dots.reg <- dotsInPolys(orpolys, as.integer(plotvar), f="regular")
plot(orpolys, xlim=c(-124.5, -115), ylim=c(42,47))
plot(dots.reg, add=T, pch=19, cex=0.5, col="purple")
plot(orpolys, add=T)
title(plottitle)

 

The map on the left randomly locates each dot, which looks nicer, but may be misleading in regions where the dots are sparse (i.e. they may look like they are identifying specific places), while the map on the right clearly shows a stylized depiction of population, but looks kind of strange.