Analysis of Variance

"Analysis of variance" (or ANOVA) is designed to test hypotheses about the equality of two or more group means, and gets its name from the idea of judging the apparent differences among the means of the groups of observations relative to the variance of the individual groups.

The basic underlying idea is to compare the variability of the observations among groups, to that within groups:

  • if the variability among groups is small (relative to the variability within groups), this lends support to a null hypothesis that the means of the different groups are identical; but
  • if the variability among groups is large, (relative to the variability within groups), this discredits the null hypothesis, and lends support for the alternative hypothesis.

There are some assumptions that underlie the application of analysis of variance, and which, if violated, add uncertainty to the results. The assumptions are:

  • there are three or more independent groups of data
  • within each group, the values of the variables are normally distributed
  • the variances of each group are equal
  • dependent or response variables are measured on an interval or ratio scale

Analysis of variance for testing for the equality of k mean values is a special case of a set of techniques known as linear modeling, which also includes regression analysis, a future topic.

  • One-way analysis of variance, k-groups of observations

The basic analysis of variance involves one nominal or ordinal scale variable that can be used to place each observation into two or more groups, and a single response variable.  The analysis can be viewed as determining whether knowledge of the group that a particular observation falls in will allow a better idea of the expected value of the response variable to be gained than in the absence of that knowledge.

The appropriate reference distribution in the case of analysis of variance is the F-distribution.  The F distribution has two parameters, the between-groups degrees of freedom, k, and the residual degrees of freedom, N-k:

# for the following examples
k <- 2         # number of groups
n <- 750       # number of observations
x <- seq(0,10,b
y=0.1)
df1 <- k-1
df2 <- n-k
pdf.f <- df(x,df1,df2)
plot(pdf.f ~ x, type="l")

# analysis of variance
attach(anovadat)

# example 1 -- sig. diff. in means, variances not sig. diff.
boxplot(Data1 ~ Group1, ylim=c(-10,50), main="ANOVA Example 1")
tapply(Data1, Group1, mean)
tapply(Data1, Group1, sd)
aov1 <- aov(Data1 ~ Group1)
aov1
summary(aov1)
hov1 <- bartlett.test(Data1 ~ Group1)
hov1
#plot(aov1)

# example 2 -- means not sig. diff, variances not sig. diff.
boxplot(Data2 ~ Group2, ylim=c(-10,50), main="ANOVA Example 2")
tapply(Data2, Group2, mean)
tapply(Data2, Group2, sd)
aov2 <- aov(Data2 ~ Group2)
aov2
summary(aov2)
hov2 <- bartlett.test(Data2 ~ Group2)
hov2
#plot(aov2)

# example 3 -- sig. diff. in means, variances not sig. diff.
boxplot(Data3 ~ Group3, ylim=c(-10,50), main="ANOVA Example 3")
tapply(Data3, Group3, mean)
tapply(Data3, Group3, sd)
aov3 <- aov(Data3 ~ Group3)
aov3
summary(aov3)
hov3 <- bartlett.test(Data3 ~ Group3)
hov3
#plot(aov3)

# example 4 -- similar means, but larger group variances
boxplot(Data4 ~ Group4, ylim=c(-10,50), main="ANOVA Example 4")
tapply(Data4, Group4, mean)
tapply(Data4, Group4, sd)
aov4 <- aov(Data4 ~ Group4)
aov4
summary(aov4)
hov4 <- bartlett.test(Data4 ~ Group4)
hov4
#plot(aov4)

# example 5 -- means not sig. diff, but variances are
boxplot(Data5 ~ Group5, ylim=c(-10,50), main="ANOVA Example 5")
tapply(Data5, Group5, mean)
tapply(Data5, Group5, sd)
aov5 <- aov(Data5 ~ Group5)
aov5
summary(aov5)
hov5 <- bartlett.test(Data5 ~ Group5)
hov5
#plot(aov5)

 

[back to topics and examples] [Geog 4/517] [Geog. 4/517 lectures]