GEOG 620
Data Analysis & Visualization

Data Analysis vs. "Statistics" and their roles in geography

  • Data analysis features continual iterations between the development of a conceptual model of reality (theory building or hypothesis generation) and the testing of that model (using formal or informal hypothesis testing).
  • Classical statistics has been more oriented toward assessment (hypothesis testing) than toward discovery of relationships within data sets.
  • Modern data analysis exploits recent developments in computing, and "scientific visualization," but still uses more traditional "statistical analysis" approaches when appropriate.
  • Roles in Geography:  "quantitative revolution", "GIS revolution", "Geographic Visualization".
  • Quantitative methods/Qualitative methods -- physical vs. human?

Statistics is not mathematics; data analysis is not statistics; but visualization is data analysis

Nature of Geographical Data

An implicit feature of most data sets that are examined by geographers is that individual "observations" have locational information attached to them.  This is an issue for most software packages.

The "Data Cube" -- attributes, locations, occasions. The cube is made up of individual cells or datums, that represent a single attribute or variable, measured at a particular place and time (observations, cases).

The Rectangular Data Set -- Two Examples

Answerable Questions

What kinds of questions can be answered?

  • What are the basic characteristics of a variable or attribute? (descriptive plots and statistics that describe location, scale, distribution).
  • Relationships among one or more variables (scatter plots and descriptive statistics)
  • Are two groups of observations different?  (analysis of variance)
  • How are they different?  (discriminant analysis)
  • How is a response variable related to one or more predictor or controlling variables? (regression analysis, generalized linear models, generalized additive models, tree-based models)
  • Are there common features (or underlying dimensions) among a group of variables (principal components analysis, factor analysis, ordination methods).
  • Are there natural groups of observations? (cluster analysis).
  • How strong is are the relationships between two groups of variables (canonical correlation analysis, canonical correspondence analysis)
  • What are the characteristic time scales of variability in a time series (spectral analysis).
  • What are the characteristic spatial scales of variability in a variable (spatial autocorrelation, geostatistics).

R:  Software for data analysis and visualization

Most statistical packages do not explicitly recognize those spatial attributes--i.e. they treat them as ordinary variables.  The principal exception is the software package R

R -- Back to the future?

R Example Sessions

Issues

  • The right tool depends on the job.

  • To explain variability, variables (predictors) must vary.

  • To explain where something occurs, knowing where it doesn't may be more important.

  • Controls cases are important (no controls, no explanation).

  • Explanation doesn't necessarily have to be quantitative.

[Geog. 414/514] [syllabus] [lectures & exercises] | [GeogR] [topics] [data sets] [documentation]