Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit 

ALY6010 Probability Theory and Introductory

Statistics

Module 4 – R Practice Assignment

Introduction:

In Module 4 the Comparison Test will be the main topic to study, Comparison tests look for differences between group means. They can be used to test the effect of categorical variables on the mean of some other characteristics. t-tests are used to precisely compare the means of two groups. This assignment has two parts, part one is how to do Two-sample t-test with unequal variance, part two is two-sample t -test with different confidence level.

Data Preparation:

 Loading the required package.

 

 The cats data set has 144 variables and 3 columns. We will only use the data from the first two column.

 

The data looks fine to me and no NA data detected. so data clean will be unnecessary

 

 

Two-sample t-test:

Part 1:

Question : Do male and female cat samples have the same body weight?

Before the test, We will take a look at the man and woman body weight data first.

The sample size of men is larger than that of women, while the mean and standard deviation of men are higher than the data of women

 

 

 

 

By comparing the boxplot, we can more clearly feel the difference in sample size and data.

 

 

The mens data shape looks like approximately normal distribution.

 

 

 

 

 

 

The womens data is definitely not normal distribution.

Use Shapiro-Wilk normality test to test whether the data are normal distributed. (At 95 confidence level)

- Null hypothesis(Ho): the data are normally distributed

- Alternative hypothesis(H1): the data are not normally distributed

 

 

 

 

 

 

 

 

 

 

As we can see, mens p-value is 0.119,  and womens are 0.0003754 which is way less than 0.05. So womens data are not normally distributed.

To test whether the two population have the same variance ,We’ll use F-test to test for homogeneity in variances.

 

The p-value of F-test is p = 0.0001157.  Its less than the significance level ɑ = 0.05. In conclusion, there is significant difference between the variances of the two sets of data.

Because we want to test whether male and female cat samples have the same body weight , So out null hypothesis will be U1=U2, in the other word ,U1-U2 =0

Hypothesis:

Ho: U1 - U2 =0

H1: U1 -U2  0

 

T = -8.7095,  df= 136.84,  P = 8.831e-15

Conclusion:

The t-test shows that the t-value is -8.7095 with a degrees of freedom of136.84. The p-value is 8.831e-15, which is less than 0.05. Therefore, we reject the null hypothesis. Theres  enough evidence to prove that the the male cat and the female cat do not have the same body weight.

Part 2:

The researchers claimed that meditation improves sleeping qualityso we will find out if it's true.

 

I put sleep time before workshop in sleep1 and sleep time after workshop in sleep 2.

 

The Sample size are the same but sleep before work shop has higher standard deviation and sleep after workshop has higher mean.

 

 

 

 

 

 There no big difference in the difference of mean but after” has smaller data interval.

 

 

 

 

 

 

 

 

Before” datas shape looks like approximately Normal distributed

 

 

 

 

 

 

 

      

The shape of after” looks like normal distributed

 

 

 

 

 

 

 

 

 

 

Use Shapiro-Wilk normality test to test whether the data are normal distributed. (At 95 confidence level)

- Null hypothesis(Ho): the data are normally distributed

- Alternative hypothesis(H1): the data are not normally distributed

 

Both p-value are greater than ɑ = 0.05 so the shape are normal distributed.

 

The p-value of F-test is p = 0.2379.  Its greater than the significance level ɑ = 0.05. In conclusion, there is no significant difference between the variances of the two sets

of data.

The researchers claimed that meditation improves sleeping quality, So we assume the mean of sleep time after workshop will be greater than the mean of sleep time before workshop.

Here we will test if the mean of the differences or “the mean difference” between before (pre) and after (post) treatment is different from 0.  For the matched pair we will compare the average difference with the true average difference .

 Hypothesis

H0U=0

H1U0

 

 

 ɑ=0.05

 

 

 

 

 

Since the p-value = 0.08322, which is greater than α=0.05we failed to reject the null hypothesis , there s no evidence to prove meditation effect on sleep time.

But for α = 0.1the p-value is less than it. So we have enough evidence to prove meditation improves sleep time with 90% confidence level.

Conclusion:

Before attempting the two-sample t-test, we need to examine the data to confirm which t-test we want to use to ensure the accuracy of the final data, and we also need to choose the most appropriate confidence level for this test to ensure that the test does not have error type 1 or 2. The data needs to be rigorous, and so are we.

Reference :

http://www.sthda.com/english/wiki/unpaired-two-samples-wilcoxon-test-in-r

http://www.sthda.com/english/wiki/unpaired-two-samples-t-test-in-r

https://statsandr.com/blog/wilcoxon-test-in-r-how-to-compare-2-groups-under-the-non-normality-assumption/

https://www.math.pku.edu.cn/teachers/lidf/docs/Rbook/html/_Rbook/ggplotvis.html

Appendix:

#part 1

install.packages("Mass")

library(MASS)

cats

dim(cats)

summary(cats)

male <- subset(cats, subset=(cats$Sex=="M"))

male

female<- subset(cats, subset=(cats$Sex=="F"))

female

summary(male)

str(male)

mean(male$Bwt)-mean(female$Bwt)

install.packages("dplyr")

library(dplyr)

group_by(cats, Sex) %>%

  summarise(

    count = n(),

    mean = mean(Bwt, na.rm = TRUE),

    sd = sd(Bwt, na.rm = TRUE)

  )

install.packages("ggpubr")

install.packages("ggplot2")

library(ggpubr)

ggboxplot(cats, x = "Sex", y = "Bwt",

          color = "Sex", palette = c("#00AFBB", "#E7B800"),

          ylab = "Bwt", xlab = "Sex")

with(cats, shapiro.test(Bwt[Sex == "M"]))

with(cats, shapiro.test(Bwt[Sex == "F"]))

p <- ggplot(data = data.frame(female), mapping = aes(

  x = Bwt))

p + geom_density()

p1 <- ggplot(data = data.frame(male), mapping = aes(

  x = Bwt))

p1 + geom_density()

res.ftest <- var.test(Bwt ~ Sex, data = cats)

res.ftest

res <- t.test(Bwt ~ Sex, data = cats, alternative = "two.sided", var.equal = FALSE)

res

#part 2

sleep1<- c(4.6, 7.8, 9.1, 5.6, 6.9, 8.5, 5.3, 7.1, 3.2, 4.4)

sleep2<- c(6.6, 7.7, 9.0, 6.2, 7.8, 8.3, 5.9, 6.5, 5.8, 4.9)

sleep_data <- data.frame(

  group = rep(c("before", "after"), each = 10),

   time = c(sleep1,  sleep2)

)

sleep_data

before <- subset(sleep_data, subset=(sleep_data$group=="before"))

before

after <- subset(sleep_data, subset=(sleep_data$group=="after"))

p <- ggplot(data = data.frame(after), mapping = aes(

  x = time))

p + geom_density()

print(sleep_data)

group_by(sleep_data, group) %>%

  summarise(

    count = n(),

    mean = mean(time, na.rm = TRUE),

    sd = sd(time, na.rm = TRUE)

  )

ggboxplot(sleep_data, x = "group", y = "time",

          color = "group", palette = c("#00AFBB", "#E7B800"),

          ylab = "time", xlab = "group")

with(sleep_data, shapiro.test(time[group == "before"]))

with(sleep_data, shapiro.test(time[group == "after"]))

res.ftest <- var.test(time ~ group, data = sleep_data)

res.ftest

t.test(sleep2, sleep1, paired=TRUE, conf.level = 0.95,

       )

t.test(sleep2, sleep1, paired=TRUE, conf.level = 0.90,

       )