| GEOG 414/515: Advanced Geographic Data
Analysis Descriptive statistics There are a number of descriptive statistics that, like descriptive plots, provide basic information on the nature of a particular variable or set of variables. A statistic is simply a number that summarizes or represents a set of observations of a particular variable. Before describing the statistics, it will be helpful to look at the summation operator, S (sigma) The Summation Operator The appearance of the summation operator, symbolized by the upper-case Greek letter sigma (S) is what convinces most casual readers that data analysis or statistics is mathematically intensive. An alternative way to look at the summation operator is as a component of an operation that is performed (most always by computer) on a set of numbers
Univariate descriptive statistics In general, descriptive statistics--like the univariate descriptive plots--can be classified into three groups, those that measure 1) central tendency or location of a set of numbers, 2) variability or dispersion, and 3) the shape of the distribution. The univariate descriptive statistics can be thought of as companions to the univariate descriptive plots. The best way to develop an idea of what the statistics are summarizing or attempting to convey is to always produce a descriptive plot first.
Descriptive statistics in R Descriptive statistics can be most easily obtained in R using the summary() function. The summary command is generic in the sense that object or "argument" of the function could be anything. If the argument is a data frame, summary() returns descriptive statistics for each variable, whereas if the argument is a single variable, summary() just returns the descriptive statistics for that variable.
Individual descriptive statistics can be obtained using the following, self-explaining functions:
Descriptive statistics for individual groups of observations can be obtained by the tapply() function. For example,
Bivariate Descriptive Statistics A frequent goal in data analysis is to efficiently describe or measure the strength of relationships between variables, or to detect associations between factors used to set up a cross tabulation. A related goal may be to determine which variables are related in a predictive sense to a particular response variable, or put another way, to learn how best to predict future values of a response variable. Correlation (and regression analysis), along with measures of association constructed from tables, provide the means for constructing and displaying such relationships. Bivariate descriptive statistics allow the strength dependence of the relationship displayed in a scatter plot to be efficiently summarized, in much the same way that the univariate descriptive statistics provide efficient summaries of the information evident in univariate plots, but the form of the relationship and possible external influences are best detected using descriptive plots, or by specific analyses like regression.
|