GEOG 414/515:  Advanced Geographic Data Analysis
Principal Components Analysis / Factor Analysis

Principal Components Analysis (PCA)

Principal components analysis (PCA) is a widely used multivariate analysis method, the general aim of which is to reveal systematic covariations among a group of variables.   The analysis can be motivated in a number of different ways, including (in geographical contexts) finding groups of variables that measure the same underlying dimensions of a data set, describing the basic anomaly patterns that appear in spatial data sets, or producing a general index of the common variation of a set of variables.

Properties of principal components

Because the components are derived by solving a particular optimization problem, they naturally have some "built-in" properties that are desirable in practice (e.g. maximum variability).  In addition, there are a number of other properties of the components that can be derived:

  • variances of each component, and the proportion of the total variance of the original variables are are given by the eigenvalues;
  • component scores may be calculated, that illustrate the value of each component at each observation;
  • component loadings that describe the correlation between each component and each variable may also be obtained;
  • the correlations among the original variables can be reproduced by the p-components, as can that part of the correlations "explained" by the first q components.
  • the original data can be reproduced by the p components, as can those parts of the original data "explained" by the first q components;
  • the components can be "rotated" to increase the interpretability of the components.

Examples

Principal components and factor analysis

Differing underlying models

  • PCA:  maximum variance, maximum simultaneous resemblance motivations
  • Factor Analysis:  variables are assembled from two major components
    • common "factors"
    • unique factors

    X = m + Lf + u

    where  

    X is a maxrix of data
    m
    is he (vector) mean of the variables
    L
    is a p x k matrix of factor loadings
    f and u are random vectors representing the underlying common and unique factors

The practical difference now lies mainly in the decision whether to rotate the principal components to emphasize the "simple structure" of the component loadings:

  • easier interpretation
  • in geographical data:  regionalization

Examples of factor analysis

Practical differences in R (and S-Plus) from other conventions

Property R PCA R Factor Analysis Usual Convention
Importance of components/factors std. dev. units
(square-root of
eigenvalue)
variance units
(sum-of-squares of loadings)
variance units
(eigenvalue)
Loadings
(1 per variable)
eigenvector elements
(aij's)
correlations between variables and factors correlations between variables and factors or components
Scores
(1 per observation)
not standardized  standardized standardized
Orthogonality of components/factors yes not necessarily usually yes

Readings:

Rogerson (Statistical Methods):  Ch. 10; Maindonald (Using R...):  ch. 6