GEOG 620
Data Analysis & VisualizationData Analysis vs. "Statistics" and their roles in geography
- Data analysis features continual iterations between the development of a conceptual
model of reality (theory building or hypothesis generation) and the testing of that model
(using formal or informal hypothesis testing).
- Classical statistics has been more oriented toward assessment (hypothesis
testing) than toward discovery of relationships within data sets.
- Modern data analysis exploits recent developments in computing, and "scientific
visualization," but still uses more traditional "statistical analysis"
approaches when appropriate.
- Roles in Geography: "quantitative revolution", "GIS
revolution", "Geographic Visualization".
- Quantitative methods/Qualitative methods -- physical vs. human?
Statistics is not mathematics; data analysis is not statistics; but
visualization
is data analysis
Nature of Geographical Data
An implicit feature of most data sets that are examined by geographers is that
individual "observations" have locational information attached to them.
This is an issue for most software packages.
The "Data Cube" --
attributes, locations, occasions. The cube is made up of individual cells or
datums, that
represent a single attribute or variable, measured at a particular place and time
(observations, cases).
The Rectangular Data Set -- Two Examples
Answerable Questions
What kinds of questions can be answered?
- What are the basic characteristics of a variable or attribute?
(descriptive plots and statistics that describe location, scale,
distribution).
- Relationships among one or more variables (scatter plots and descriptive
statistics)
- Are two groups of observations different? (analysis of variance)
- How are they different? (discriminant analysis)
- How is a response variable related to one or more predictor or
controlling variables? (regression analysis, generalized linear models,
generalized additive models, tree-based models)
- Are there common features (or underlying dimensions) among a group of
variables (principal components analysis, factor analysis, ordination
methods).
- Are there natural groups of observations? (cluster analysis).
- How strong is are the relationships between two groups of variables
(canonical correlation analysis, canonical correspondence analysis)
- What are the characteristic time scales of variability in a time series
(spectral analysis).
- What are the characteristic spatial scales of variability in a variable
(spatial autocorrelation, geostatistics).
R: Software for data analysis and visualization
Most statistical packages do not explicitly recognize those spatial
attributes--i.e. they treat them as ordinary variables. The principal exception is the software package
R
R -- Back to the future?
R Example Sessions
Issues
-
The right tool depends on the
job.
-
To explain variability,
variables (predictors) must vary.
-
To explain where something
occurs, knowing where it doesn't may be more important.
-
Controls cases are important
(no controls, no explanation).
-
Explanation doesn't necessarily
have to be quantitative.
[Geog. 414/514] [syllabus]
[lectures
& exercises] | [GeogR] [topics]
[data sets]
[documentation]
|