关键词 > STAT2008/4038/6038

STAT2008/4038/6038 Regression Modelling Semester 1 - End of Semester, 2018

发布时间:2022-05-24

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Semester 1 - End of Semester, 2018

STAT2008/4038/6038 Regression Modelling

Instructions to Students:

1.) This examination paper comprises a total of twenty-two (22) pages and there is a separate

handout of R  Output which also has a total of twenty (20) pages. During the reading time preceding the exam, please check that both documents have the correct number of pages.

2.) All answers are to be written on this exam paper, which is to be handed in at the end of

the exam. You may make notes on scribble paper (or on the R  Output) during the reading time, but do NOT write on this exam paper until after the start of the writing time. If you need additional space, use the rear of the previous page and clearly indicate the part of the question that your answer refers to.  The R  Output and any scribble paper will be collected at the end of the examination and destroyed, they will not be marked.

3.) There are four questions worth a total of 135 marks for STAT4038/6038 students, and

three questions worth a total of 120 marks for STAT2008 students. The marks of each question are of unequal value, with the marks indicated for each part. You should attempt to answer all parts of Q1, Q2, Q3 and Q4 (Only for STAT4038/6038).

● Q1: 54 marks

● Q2: 51 marks

● Q3: 15 marks

● Q4 (Only for STAT4038/6038): 15 marks

4.) Please write your student number in the space provided at the top of previous

page.

5.) Include a clear statement of the formulae you use to answer each question.

6.)  Statistical tables  (generated using R) are provided on pages  17 to 20 at the end of the

handout of R  Output.

7.) Unless otherwise indicated, use a significance level (α) of 5%.

Total Marks = 120 (for STAT2008)    or     135 (for STAT4038/6038)

This exam will be worth 60% of the final assessment.

Question 1 [54 marks]: An investigation was conducted in the 1920s to determine the relation- ship between speed (mph) and the stopping distance (ft) of n = 50 cars.

a.   Consider the simple linear regression model of the following form:

Yi  = β0 + β1 xi + ii ,   ii normal(0, σ2 ),

where y = stopping distance and x = speed.  Based on the summary statistics of the data, which are on page 1 of the R  Output, we are interested in the 6 missing values in the following regression summary table.

> mod.q1.a  <- lm(dist ~ speed, data=cars)

>  sumary(mod.q1.a)

Estimate  Std.  Error  t  value  Pr(>|t|)

(Intercept)    ???????       6.75844    ???????    0.01232

speed               ???????       ???????    ???????  1.49e-12


n  =  50, p  =  2,  Residual  SE  =  15.37959,  R-Squared  =  ????

[18 marks] Compute the missing values.

Continued.

b.   Residual plots for the model in part (a) are shown on pages 2 and 3 of the R  Output.  We are interested in whether these plots suggest any problems with the underlying assumptions.

[3 marks] Are there any problem(s) indicated on the Residuals vs Fitted plot on page 2? If so describe the problem(s):

[3 marks] Are there any problem(s) indicated on the Normal Q-Q plot on page 3? If so describe the problem(s):

[3 marks] Are there any problem(s) indicated on the Cooks distance plot on page 3? If so describe the problem(s):

[3 marks] What is your overall assessment? (Select just ONE of the following options.)

□ Residuals are not independent (obvious pattern)

□ Residuals do not have constant variance (heteroscedasticity)

□ Residuals are not normally distributed

□ There are possible outliers and/or influential observations

□ More than one of the above problems

No obvious problems

c.   The response Y  was transformed by taking the square-root.   A linear regression model was then fit  (mod.q1.c).   The regression table is presented on page 4 of the R  Output. Additionally, on pages 4 and 5, residual diagnostic plots are presented.

[3 marks] Provide an interpretation of the relationship between speed and dist (1/2) based on the linear regression model.

[3 marks] Using the min, median, and max values for speed on page 1 of the R Output, estimate the relationship between speed and distance on the original scale for distance.

[6 marks] Using the min, median, and max values for speed on page 1 of the R Output, estimate the 95% prediction intervals for distance at a given speed (on the original scale). Discuss the interval.

[3 marks] Provide a 95% confidence interval for β1 (the regression coefficient for speed).

[3 marks] Conduct the following hypothesis test for the intercept:

H0 : β0 = 0 vs. H1 : β0 0.