Geog. 414/514 --  Advanced Geographic Data Analysis
Spring 2008 -- Due Wednesday June 11th Friday, June 13th

Answer the following questions in clear, complete, and grammatically-correct sentences. You may, however, illustrate any specific answer by using a table or figure, with accompanying text. Be brief, but informative. Make sure you answer all parts of a question.  The questions below probably are answerable within a single page, but do not exceed two double-spaced pages (in a 10 or 12-point font, with normal margins) for each question (figures may be attached as additional pages). 

Because it is likely that the opportunity to discuss the questions with others will arise, you may do so, but work out and write down the answers yourself.

  1. Many data-analytical procedures share one version or another of the same underlying conceptual model: 

data = predictable component + unpredictable component; or
data = signal + noise; or
data = common variation + unique variation

For regression analysis and analysis of variance, describe the particular version of that common conceptual model that applies, and why that conceptual model makes sense given the goals of the analysis.

  1. Describe the general context in which regression analysis is applicable. (What is it used for? Are there any assumptions that underlie its use? How is it implemented in practice?)

  2. What criterion (i.e. what particular value) is minimized when fitting a regression equation or line to some data? Describe the relationship between this criterion and a) the multiple correlation coefficient (R2), and b) the F-statistic in the regression analysis of variance.

  3. Describe how "nonparametric" regression works, as typified by a loess/lowess curve added to a bivariate scatter plot. How is the curve constructed? Are there some particular quantities that are optimized (like in standard regression analysis, where the sum of squares of residuals are minimized)? What controls the smoothness of the fitted curve? How does one tell whether a loess curve does a good job of representing the relationship between variables?