|
Analysis of Variance
"Analysis of variance" (or ANOVA) is designed
to test hypotheses about the equality of two or more group means, and gets
its name from the idea of judging the apparent differences among the means
of the groups of observations relative to the variance of the
individual groups.
The basic underlying idea is to compare the variability
of the observations among groups, to that within groups:
- if the variability among groups is small (relative to
the variability within groups), this lends support to a null
hypothesis that the means of the different groups are identical; but
- if the variability among groups is large, (relative
to the variability within groups), this discredits the null
hypothesis, and lends support for the alternative hypothesis.
There are some assumptions that underlie the application
of analysis of variance, and which, if violated, add uncertainty to the
results. The assumptions are:
- there are three or more independent groups
of data
- within each group, the values of the variables are
normally distributed
- the variances of each group are equal
- dependent or response variables are measured on
an interval or ratio scale
Analysis of variance for testing for the equality of k
mean values is a special case of a set of techniques known as linear
modeling, which also includes regression analysis, a future topic.
- One-way analysis of variance, k-groups of observations
The basic analysis of variance involves one nominal or ordinal scale
variable that can be used to place each observation into two or more
groups, and a single response variable. The analysis can be viewed
as determining whether knowledge of the group that a particular observation
falls in will allow a better idea of the expected value of the response
variable to be gained than in the absence of that knowledge.
The appropriate reference distribution in the case of analysis of variance is
the F-distribution. The F distribution has two parameters,
the between-groups degrees of freedom, k, and the residual degrees of
freedom, N-k:
# for the following examples
k <- 2 # number of groups
n <- 750 # number of observations
x <- seq(0,10,by=0.1)
df1 <- k-1
df2 <- n-k
pdf.f <- df(x,df1,df2)
plot(pdf.f ~ x, type="l")
# analysis of variance attach(anovadat)
# example 1 -- sig. diff. in means, variances not sig. diff. boxplot(Data1 ~ Group1, ylim=c(-10,50), main="ANOVA Example 1") tapply(Data1, Group1, mean) tapply(Data1, Group1, sd) aov1 <- aov(Data1 ~ Group1) aov1 summary(aov1) hov1 <- bartlett.test(Data1 ~ Group1) hov1 #plot(aov1)
# example 2 -- means not sig. diff, variances not sig. diff. boxplot(Data2 ~ Group2, ylim=c(-10,50), main="ANOVA Example 2") tapply(Data2, Group2, mean) tapply(Data2, Group2, sd) aov2 <- aov(Data2 ~ Group2) aov2 summary(aov2) hov2 <- bartlett.test(Data2 ~ Group2) hov2 #plot(aov2)
# example 3 -- sig. diff. in means, variances not sig. diff. boxplot(Data3 ~ Group3, ylim=c(-10,50), main="ANOVA Example 3") tapply(Data3, Group3, mean) tapply(Data3, Group3, sd) aov3 <- aov(Data3 ~ Group3) aov3 summary(aov3) hov3 <- bartlett.test(Data3 ~ Group3) hov3 #plot(aov3)
# example 4 -- similar means, but larger group variances boxplot(Data4 ~ Group4, ylim=c(-10,50), main="ANOVA Example 4") tapply(Data4, Group4, mean) tapply(Data4, Group4, sd) aov4 <- aov(Data4 ~ Group4) aov4 summary(aov4) hov4 <- bartlett.test(Data4 ~ Group4) hov4 #plot(aov4)
# example 5 -- means not sig. diff, but variances are boxplot(Data5 ~ Group5, ylim=c(-10,50), main="ANOVA Example 5") tapply(Data5, Group5, mean) tapply(Data5, Group5, sd) aov5 <- aov(Data5 ~ Group5) aov5 summary(aov5) hov5 <- bartlett.test(Data5 ~ Group5) hov5 #plot(aov5)
|