|
GEOG 414/515: Advanced Geographic
Data Analysis
Principal Components Analysis / Factor Analysis
Principal Components Analysis (PCA)
Principal components analysis (PCA) is a widely used multivariate
analysis method, the general aim of which is to reveal systematic
covariations among a group of variables. The analysis can be
motivated in a number of different ways, including (in geographical
contexts) finding groups of variables that measure the same underlying
dimensions of a data set, describing the basic anomaly patterns that
appear in spatial data sets, or producing a general index of the common
variation of a set of variables.
Properties of principal components
Because the components are derived by solving a particular optimization
problem, they naturally have some "built-in" properties that are
desirable in practice (e.g. maximum variability). In addition, there
are a number of other properties of the components that can be derived:
- variances of each component, and the proportion of the
total variance of the original variables are are given by the
eigenvalues;
- component scores may be calculated, that illustrate the
value of each component at each observation;
- component loadings that describe the correlation between
each component and each variable may also be obtained;
- the correlations among the original variables can be
reproduced by the p-components, as can that part of the correlations
"explained" by the first q components.
- the original data can be reproduced by the p components, as
can those parts of the original data "explained" by the
first q components;
- the components can be "rotated" to increase the
interpretability of the components.
Examples
Principal components and factor analysis
Differing underlying models
- PCA: maximum variance, maximum simultaneous resemblance
motivations
- Factor Analysis: variables are assembled from two major
components
- common "factors"
- unique factors
X = m + Lf
+ u
where
X is a maxrix of data
m is he (vector) mean of the variables
L is a p x k matrix of factor loadings
f and u are random vectors representing
the underlying common and unique factors
The practical difference now lies mainly in the decision whether to
rotate the principal components to emphasize the "simple
structure" of the component loadings:
- easier interpretation
- in geographical data: regionalization
Examples of factor analysis
Practical differences in R (and S-Plus) from other conventions
| Property |
R PCA |
R Factor Analysis |
Usual Convention |
| Importance of
components/factors |
std. dev. units
(square-root of
eigenvalue) |
variance units
(sum-of-squares of loadings) |
variance units
(eigenvalue) |
Loadings
(1 per variable) |
eigenvector elements
(aij's) |
correlations between
variables and factors |
correlations between
variables and factors or components |
Scores
(1 per observation) |
not standardized |
standardized |
standardized |
| Orthogonality of
components/factors |
yes |
not necessarily |
usually yes |
Readings:
Rogerson (Statistical Methods): Ch. 10; Maindonald (Using
R...): ch. 6
|