Blog

Practice makes perfect.

One-way analysis of variance (ANOVA)

2021-03-22


Defination of ANOVA

Analysis of variance (ANOVA) is an analysis tool used in statistics that splits an observed aggregate variability found inside a data set into two parts: systematic factors and random factors. The systematic factors have a statistical influence on the given data set, while the random factors do not. Analysts use the ANOVA test to determine the influence that independent variables have on the dependent variable in a regression study– cited from WILL KENTON(https://www.investopedia.com/terms/a/anova.asp)

Load data

library(multcomp)
library(dplyr)
setwd("C:/blog/Dataset")
data <- read.csv("fruits_Vc.csv")
head(data)
##   Number Fruit Repeat Vitamin
## 1      1 Apple     A1     4.6
## 2      2 Apple     A2     3.9
## 3      3 Apple     A3     5.2
## 4      4 Apple     A4     6.9
## 5      5 Apple     A5     4.8
## 6      6 Apple     A6     3.3

To compare the vitamin C contents of different fruits

data$Fruit = as.factor(data$Fruit)
VitaminC <- data$Vitamin
Fruits <- data$Fruit

aggregate(VitaminC, by =list(Fruits), FUN=mean)
##      Group.1         x
## 1      Apple  4.783333
## 2     Banana 10.266667
## 3 Watermelon  8.683333
aggregate(VitaminC, by =list(Fruits), FUN=sd)
##      Group.1         x
## 1      Apple 1.2384130
## 2     Banana 1.5807171
## 3 Watermelon 0.9847165
fit <- aov(VitaminC ~ Fruits)

summary(fit)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## Fruits       2  95.57   47.78   28.66 7.52e-06 ***
## Residuals   15  25.01    1.67                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(fit)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = VitaminC ~ Fruits)
## 
## $Fruits
##                        diff       lwr       upr     p adj
## Banana-Apple       5.483333  3.546906 7.4197605 0.0000068
## Watermelon-Apple   3.900000  1.963573 5.8364272 0.0002819
## Watermelon-Banana -1.583333 -3.519761 0.3530938 0.1184352
par(mar=c(5,4,6,2))
data$Fruit = as.factor(data$Fruit)
tuk <- glht(fit,linfct= mcp(Fruits="Tukey"))

p1 <- plot(cld(tuk,level=.05),col="lightgrey")

To save the ANOVA results after calculations

table <- group_by(data, data$Fruit) %>%
  summarise(
    .groups = 'drop',
    count = n(),
    mean = mean(Vitamin, na.rm = TRUE),
    sd = sd(Vitamin, na.rm = TRUE)
  )

View(table)

print(table)
## # A tibble: 3 x 4
##   `data$Fruit` count  mean    sd
##   <fct>        <int> <dbl> <dbl>
## 1 Apple            6  4.78 1.24 
## 2 Banana           6 10.3  1.58 
## 3 Watermelon       6  8.68 0.985