| GEOG 414/515: Advanced Geographic
Data Analysis Multivariate distances and cluster analysis There is a broad group of multivariate analyses that have as their objective the organization of individual observations (objects, sites, individuals), and these analyses are built upon the concept of multivariate distances (expressed either as similarities or dissimilarities) among the objects. The organization generally takes two forms:
These analyses share many concepts and techniques (both numerical and practical) with other procedures such as principal components analysis, numerical taxonomy, discriminant analysis and so on. The analyses generally begin with the construction of an n x n matrix D of the distances between objects. For example, in a two dimensional space, the elements dij of D could be the Euclidian distances between points,
The Euclidian distance, and related measures are easily generalized to more than two dimensions. 1. Basic distances 2. Mahalanobis distances The basic Euclidian distance treats each variable as equally important in calculating the distance. An alternative approach is to scale the contribution of individual variables to the distance value according to the variability of each variable. This approach is illustrated by the Mahalanobis distance, which is a measure of the distance between each observation in a multidimensional cloud of points and the centroid of the cloud. The Mahalnobis distance D2 is given by
where x is a vector of values for a particular observation, m is the vector of means of each variable, and V is the variance-covariance matrix. 3. Multidimensional scaling (MDS) The objective of MDS is to portray the relationship between objects in a multidimensional space in a lower-dimensional space (usually 2-D) in such a way that the relative distances among objects in the multidimensional space are preserved in the lower-dimensional space. The classic illustrative example is the analysis of geographically arrayed data, which can be done with the Oregon climate-station data: 4. Cluster analysis In a cluster analysis, the objective is to use similarities or dissimilarities among objects (expressed as multivariate distances), to assign the individual observations to "natural" groups. Cathy Whitlock's surface sample data from Yellowstone National Park describes the spatial variations in pollen data for that region, and each site was subjectively assigned to one of five vegetation zones. Readings Crawley (Statistical Computing...): Ch. 40; Manley (Multivariate Statistical Methods...) ch. 9 |