关键词 > STAT2008/4038/6038

STAT2008/4038/6038 Regression Modelling Semester 1 - End of Semester, 2018

发布时间:2023-05-19

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Research School of Finance, Actuarial Studies and Statistics

Examinatation

Semester 1 - End of Semester, 2018

STAT2008/4038/6038 Regression Modelling

Question 1 [54 marks]: An investigation was conducted in the 1920s to determine the relation- ship between speed (mph) and the stopping distance (ft) of n = 50 cars.

a.   Consider the simple linear regression model of the following form:

Yi  = β0 + β1 xi + ei ,   ei normal(0, σ2 ),

where y = stopping distance and x = speed.  Based on the summary statistics of the data, which are on page 1 of the R  Output, we are interested in the 6 missing values in the following regression summary table.

> mod .q1 .a  <- lm(dist ~ speed, data=cars)

>  sumary(mod .q1 .a)

Estimate  Std .  Error  t  value  Pr(>|t|)

(Intercept)    ???????       6 .75844    ???????    0 .01232

speed               ???????       ???????    ???????  1 .49e-12

n  =  50, p  =  2,  Residual  SE  =  15 .37959,  R-Squared  =  ????

[18 marks] Compute the missing values.

b.   Residual plots for the model in part (a) are shown on pages 2 and 3 of the R  Output.  We are interested in whether these plots suggest any problems with the underlying assumptions.

[3 marks] Are there any problem(s) indicated on the Residuals vs Fittedplot on page 2? If so describe the problem(s):


[3 marks] Are there any problem(s) indicated on the Normal Q-Qplot on page 3? If so describe the problem(s):


[3 marks] Are there any problem(s) indicated on the Cooks distanceplot on page 3? If so describe the problem(s):

[3 marks] What is your overall assessment? (Select just ONE of the following options.)

□ Residuals are not independent (obvious pattern)

□ Residuals do not have constant variance (heteroscedasticity)

□ Residuals are not normally distributed

□ There are possible outliers and/or influential observations

□ More than one of the above problems

No obvious problems

c.   The response Y  was transformed by taking the square-root.   A linear regression model was then fit  (mod .q1 .c).   The regression table is presented on page 4 of the R  Output. Additionally, on pages 4 and 5, residual diagnostic plots are presented.

[3 marks] Provide an interpretation of the relationship between speed and dist (1/2) based on the linear regression model.


[3 marks] Using the min, median, and max values for speed on page 1 of the R Output, estimate the relationship between speed and distance on the original scale for distance.


[6 marks] Using the min, median, and max values for speed on page 1 of the R Output, estimate the 95% prediction intervals for distance at a given speed (on the original scale). Discuss the interval.


[3 marks] Provide a 95% condence interval for β1 (the regression coefficient for speed).


[3 marks] Conduct the following hypothesis test for the intercept:

H0 : β0 = 0 vs. H1 : β0 0.


[3 marks] Conduct the following hypothesis test:

H0 : β1 = 0.35 vs. H1 : β1 < 0.35.


[3 marks] Based on the regression tables and diagnostic plots, clearly outline which model you prefer: the model in part (a) or the model in part (c).

Question 2 [51 marks]:  Data were collected on a random sample of 30 players from the 2010 World Cup.  We are interested in modelling the response  Time (time a player played in minutes over the World Cup) based on a few covariates:  Shots  (the number of shots attempted), Passes (the number of passes made), and Tackles (the number of tackles made). Some summary statistics and scatter plots can be found on pages 6 and 7 of the R  Output.

a.   An initial multiple linear regression model was t with the covariates Shots  and Passes. The regression summary can be found on page 8 of the R  Output.  The model is labeled mod .q .2 .a. Based on the regression summary ll in the following ANOVA Table.

Df

Sum Sq

Mean Sq

F value

Pr(>F)

Passes

2.431e-13

Shots

Residuals


[33 marks] Compute the values in the table. Note: As rounding errors will accumulate as you derive entries in this table from other values shown in the R output, be careful about rounding intermediate values.

b.   A second multiple linear regression model was t with the covariates Shots, Passes, and Tackles.  The regression summary can be found on page 9 of the R  Output.  The model is labeled mod .q .2 .b.

[3 marks] Provide an interpretation of the relationship between Time and Shots, based on the multiple linear regression.


[3 marks] Based on the maximum values for