闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MATH253 Week 8 Tutorial

R Tutorial

This tutorial sheet is related to material covered in chapters 10 and 11 (how to do tests/calculations/plots from these chapters in R).

Solutions will be available on Canvas after the R drop-in session on Tuesday.

Part A

For each of ﬁve diﬀerent types of pasta (A, B, C, D, E), seven 100g servings were prepared, and the amount of salt absorbed was recorded for each of the 35 servings. The aim is to determine whether there are signiﬁcant

diﬀerences in mean salt absorption for the ﬁve types of pasta.

The data can be found on Canvas – ﬁle Tutorial8 salt.xlsx.

❼ Download the ﬁle Tutorial8 salt.xlsx to your computer into a folder dedicated to R. Make sure

that this folder is set up as your working directory in RStudio.

❼ In RStudio open a new R script.

❼ Load the ﬁle Tutorial8 salt.xlsx using readxl package, creating the variable called salt. (See

Tutorial 2 for details how to load data using readxl package.)

❼ Make sure you save your R script in the folder dedicated to R and it is a good idea to keep saving it

after each task you complete.

1. Perform the ANOVA F-test and report your conclusions.

❼ If the data are in seperate columns like in our case, we will have to combine them into one column while preserving groups. This can be done by using the command cbind (this ensures that responses will be assigned to their groups A, B ,C, D, E), loading into a data frame and then using the command stack (this will create two columns of data – one column called values which consists of all responses and the other called ind which says to which group the responses belong). Run the following code to do this:

A <- salt$A

B <- salt$B

C <- salt$C

D <- salt$D

E <- salt$E

combined <- data.frame(cbind(A, B, C, D, E))

stacked <- stack(combined)

stacked

❼ Now we perform the ANOVA test, using the command aov, to print the results we use the command

summary. So, we run the following code:

anovaresults <- aov(values ~ ind, data = stacked)

summary(anovaresults)

Note that in aov we ﬁrst deﬁne what the responses and groups are, which in our case are values and ind, and they are separated by the symbol tilde ~. The expression values ~ ind tells R that our responses in values depend/are linked to groups in ind. The order is important here so take care that you put the columns of responses on the left-hand side of ~ and the column of groups on the right-hand side of ~.

Using data = stacked, we tell R with which data set to work which in our case is stacked.

R produces an ANOVA table, including the p-value in the column Pr(>F).

Give the value of the estimate of the error variance σ 2 .

2. Using the normality test for residuals, the histogram of residuals, and the normal probability plot of residuals decide if the assumption about normal distribution is reasonable here.

❼ First we need to ﬁnd the residuals by running the command residuals(anovaresults).

❼ Now use these residuals to perform the normality test, to construct the histogram and the normal

probability plot, using the commands discussed in Tutorial 6.

3. Is the assumption of equal variances justiﬁed for these data? Explain your answer, using appropriate tests and boxplots.

❼ We perform Bartlett’s test (based on the normal distribution) by running the command:

bartlett.test(values ~ ind, data = stacked)

❼ To perform Levene’s test, we will have to ﬁrst install the package car (see Tutorial 2 for details how

to install a package), and then run the commands:

library(car)

leveneTest(values ~ ind, data = stacked)

Without carrying out any formal tests, what is a rough rule for deciding whether it is OK to assume equal variances? For the Salt Absorption data, does this rough rule suggest that assuming equal variances is OK?

4. Which groups have signiﬁcantly diﬀerent means? Perform post-hoc tests to answer this. ❼ We perform Tukey’s HSD test by running the following command:

TukeyHSD(anovaresults)

We obtain the output of p-values for the diﬀerences between all pairs of groups.

❼ To perform Fisher’s LSD test, we will have to ﬁrst install the package PMCMRplus (see Tutorial 2

for details how to install a package), and then run the commands:

library(PMCMRplus)

summary(lsdTest(anovaresults))

We obtain the output of test statistics and p-values for the diﬀerences between all pairs of groups.

Finally, save your work.

Part B

Diabetic retinopathy is a disease of the retina (the back of the eye which is important for our vision). A clinician identiﬁes three stages of diabetic retinopathy as: No Diabetic Retinopathy (Group 1), Early Diabetic Retinopathy (Group 2) and Late Diabetic Retinopathy (Group 3). The clinician wants to ﬁnd out if the visual acuity of patients at diﬀerent stages of diabetic retinopathy diﬀer. The clinician collected data for 60 randomly chosen patients. Visual acuity (VA) was measured as the number of letters correctly read from a standardised vision chart. Therefore the larger the VA number, the better the vision. You may assume that the groups are independent. The data set is available on Canvas – ﬁle Tutorial8 vision.xlsx.

1. Perform Analysis of Variance for these data. State conclusions clearly, and provide practical interpretation.

2. Decide whether the assumptions of normality and equal variances are justiﬁed here. Explain your answer carefully, using appropriate tests.

3. Use Post-Hoc tests to determine which disease group patients diﬀer in visual acuity. Report your conclu- sions clearly.

4. Using the results of the Post-Hoc tests and a box plot, discuss whether we can use visual acuity to diﬀerentiate between the early and late stage of diabetic retinopathy.