Blog

Practice makes perfect.

Normal distribution

2021-01-22


The data of most food experiments will be analyzed for significance. In the writing method section of many articles, we will mention what method is used for significance analysis. If P is less than 0.05, the difference between variables is considered to be significant. However, before the significance analysis, some preprocessing shoule carried out to make sure Normal distribution, otherwise the result is inaccurate. The following summarizes the data preprocessing that can be performed in R to meet the requirements of saliency analysis.

Before statistical analysis of data such as significance analysis, the data should meet normally distributed. ## load dataset

setwd("C:/blog/Dataset")

data <- read.csv("fruits_Vc.csv")

head(data)
##   Number Fruit Repeat Vitamin
## 1      1 Apple     A1     4.6
## 2      2 Apple     A2     3.9
## 3      3 Apple     A3     5.2
## 4      4 Apple     A4     6.9
## 5      5 Apple     A5     4.8
## 6      6 Apple     A6     3.3

Plot

Apple <- data[1:6,4]

shapiro.test(Apple)
## 
##  Shapiro-Wilk normality test
## 
## data:  Apple
## W = 0.94874, p-value = 0.7301
library(ggpubr)

ggdensity(Apple, 
          main = "Density plot of apple",
          xlab = "Apple")

ggqqplot(Apple)

Shapiro-Wilk normality test

dataset: data$Vitamin

W = 0.96026, p-value = 0.6066 When the P value here is greater than 0.05, it represents a normal distribution.

You can also observe the normal distribution graph:

library("ggpubr")
ggdensity(data$Vitamin, 
          main = "Density plot of Vitamin",
          xlab = "Vitamin")

ggqqplot(data$Vitamin)