闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MATH253 Week 10 Tutorial

R Tutorial

This tutorial sheet is related to material covered in chapter 12 (how to do tests/calculations/plots from this chapter in R).

Solutions will be available on Canvas after the R drop-in session on Tuesday.

Part A

In an experiment to investigate the performance of a multi-user computer system, the following data were collected, consisting of observations on the average time taken (y, in seconds) for each terminal to complete a particular task when the same task was submitted simultaneously to x terminals.

x	40	50	60	45	40	10	30	20	50	30	65	40	65	65
y	9.9	17.8	18.4	16.5	11.9	5.5	11.0	8.1	15.1	13.3	21.8	13.8	18.6	19.8

The data can be found on Canvas – ﬁle Tutorial10 timings.xlsx.

❼ Download the ﬁle Tutorial10 timings.xlsx to your computer into a folder dedicated to R. Make

sure that this folder is set up as your working directory in RStudio.

❼ In RStudio open a new R script.

❼ Load the ﬁle Tutorial10 timings using readxl package, creating the variable called timeDF.

(See Tutorial 2 for details how to load data using readxl package.)

❼ Make sure you save your R script in the folder dedicated to R and it is a good idea to keep saving it

after each task you complete.

1. Plot the data.

❼ Create the new variables for the columns in timeDF as following:

term <- timeDF$Terminals

time <- timeDF$Time

❼ Use the command plot to create a scatterplot with term on the horizontal axis and time on

the vertical axis.

From your scatterplot, would you say that a straight line provides a reasonable model for the given data?

2. Carry out Simple Linear Regression analysis for these data as follows.

❼ We use the command lm and to print the results we use the command summary. So, we run the

following code:

lin <- lm(time 8 term, data=timeDF)

lin

summary(lin)

❼ Note that in lm we ﬁrst deﬁne what the column of the response variable y is, which in our case

is time. This is then followed by the symbol tilde 8, and then the column of the explanatory variable x, which in our case is term. The expression time 8 term tells R that our responses in time depend on term. The order is important here so take care that you put the column of the response variable y on the left-hand side of 8 and the column of the explanatory variable x on the right-hand side of 8.

❼ Using data = timeDF, we tell R with which data set to work which in our case is timeDF.

❼ Note that calling lin gives only the coeﬃcients of the linear regression. The code summary(lin)

provides more information, such as the test statistics and p-values for the two-sided tests for the slope and intercept, R2 , the test statistic for the ANOVA F-test etc.

Write down the ﬁtted regression equation.

Denoting respectively by β0 , β 1 the intercept and slope parameters, R carries out tests of the hypotheses H0 : β0 = 0 versus H1 : β0 0 and H0 : β 1 = 0 versus H1 : β 1 0. The results appear in rows labelled

(Intercept) (for β0 ) and term (for β 1 ) in the R output.

Report the conclusions from the two hypothesis tests, of H0 : β0 = 0 versus H1 : β0 0 and H0 : β 1 = 0 versus H1 : β 1 0.

Give the estimated value of the error variance σ 2 . Note: R outputs the estimated value of σ which is called Residual standard error in the output.

Report and interpret the R2 value. Note: We use the value Multiple R-squared. The value of Adjusted R-squared is not covered in this module, it will be covered in higher years of your studies.

3. Using the normality test for residuals, the histogram of residuals, and the normal probability plot of residuals decide if the assumption of normally distributed errors appear to be justiﬁed here.

❼ First we need to ﬁnd the residuals by running the command residuals(lin).

❼ Now use these residuals to perform the normality test, to construct the histogram and the normal

probability plot, using the commands discussed in Tutorial 6 and 8.

4. Plot the ﬁtted line.

❼ First we use the command plot to plot the points and then abline with the reference to the

linear regression model:

plot(term, time, col="blue", pch=19)

abline(lin, col="red")

5. Plot the plot of residuals versus ﬁtted values.

❼ First we use the command fitted.values with the reference to the linear regression model

which calculates ﬁtted values for all observed x-values:

fit <- fitted.values(lin)

❼ Now use the command plot to create the plot with the ﬁtted values fit on the horizontal axis

and the residuals (found earlier) on the vertical axis.

Discuss whether the simple linear regression model is appropriate here.

6. Compute 95% prediction and conﬁdence intervals when the task is submitted to 50 and 70 terminals. Compute also 90% prediction intervals when the task is submitted to 50 and 70 terminals.

❼ First we create a data frame with the x0-values, in our case 50 and 70, and call it pr (for example):

pr <- data.frame(term=c(50, 70))

❼ Now we use the command predict to construct the prediction and conﬁdence intervals in the

following way:

predict(lin, pr, interval = c("prediction"), level = 0.95) predict(lin, pr, interval = c("confidence"), level = 0.95)

First paramater in the command predict tells R which model to use, which in our case we deﬁned earlier as lin. The second parameter tells R what x0-values to use for prediction; we deﬁned them as pr. Then we deﬁne whether we want to calculate the prediction or conﬁdence intervals by using interval = c("prediction") or interval = c("confidence"). And ﬁnally, we deﬁne the conﬁdence level using the parameter level.

R outputs columns fit for the ﬁtted values, lwr and upr for lower and upper endpoints of the intervals.

Report the R results, and explain the diﬀerent interpretations of the prediction and conﬁdence intervals. Discuss the factors aﬀecting the widths of the various intervals you have computed.

Part B

Eye melanoma is a type of cancer occurring in the eye. A clinician wants to ﬁnd out if the size of a tumour depends on the age of the patient. He collected a random sample of 40 patients and recorded the total volume of the tumour (in mm3 ) and the age of the patient (in years). The data set is available on Canvas – ﬁle Tutorial10 tumour.xlsx.

1. Find the ﬁtted regression line to predict the volume of the tumour from the age of the patient.

2. Find the 90% conﬁdence interval for the mean tumour volume for an 80 year old patient. Find also the 90% prediction interval for the tumour volume of an 80 year old patient. Give a practical interpretation of these intervals.

3. Does the tumour volume depend on the age of the patient? Use a formal statistical test to answer this question.

4. State all assumptions about the errors in simple linear regression. Decide whether the errors follow a normal distribution.

5. Decide whether the simple linear regression model seems appropriate here.