关键词 > STAT2008/4038/6038

STAT2008/4038/6038 Regression Modelling Semester 1 - End of Semester, 2018

发布时间:2023-06-06

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Research School of Finance, Actuarial Studies and Statistics

Examinatation

Semester 1 - End of Semester, 2018

STAT2008/4038/6038 Regression Modelling

Question 1 [54 marks]: An investigation was conducted in the 1920s to determine the relation- ship between speed (mph) and the stopping distance (ft) of n = 50 cars.

a.   Consider the simple linear regression model of the following form:

Yi  = β0 + β1 xi + ei ,   ei normal(0, σ2 ),

where y = stopping distance and x = speed.  Based on the summary statistics of the data, which are on page 1 of the R  Output, we are interested in the 6 missing values in the following regression summary table.

> mod .q1 .a  <- lm(dist ~ speed, data=cars)

>  sumary(mod .q1 .a)

Estimate  Std .  Error  t  value  Pr(>|t|)

(Intercept)    ???????       6 .75844    ???????    0 .01232

speed               ???????       ???????    ???????  1 .49e-12

n  =  50, p  =  2,  Residual  SE  =  15 .37959,  R-Squared  =  ????

[18 marks] Compute the missing values.

b.   Residual plots for the model in part (a) are shown on pages 2 and 3 of the R  Output.  We are interested in whether these plots suggest any problems with the underlying assumptions.

[3 marks] Are there any problem(s) indicated on the Residuals vs Fittedplot on page 2? If so describe the problem(s):

[3 marks] Are there any problem(s) indicated on the Normal Q-Qplot on page 3? If so describe the problem(s):

[3 marks] Are there any problem(s) indicated on the Cooks distanceplot on page 3? If so describe the problem(s):

[3 marks] What is your overall assessment? (Select just ONE of the following options.)

Residuals are not independent (obvious pattern)

Residuals do not have constant variance (heteroscedasticity)

Residuals are not normally distributed

There are possible outliers and/or influential observations

More than one of the above problems

No obvious problems

c.   The response Y  was transformed by taking the square-root.   A linear regression model was then fit  (mod .q1 .c).   The regression table is presented on page 4 of the R  Output. Additionally, on pages 4 and 5, residual diagnostic plots are presented.

[3 marks] Provide an interpretation of the relationship between speed and dist (1/2) based on the linear regression model.

[3 marks] Using the min, median, and max values for speed on page 1 of the R Output, estimate the relationship between speed and distance on the original scale for distance.

[6 marks] Using the min, median, and max values for speed on page 1 of the R Output, estimate the 95% prediction intervals for distance at a given speed (on the original scale). Discuss the interval.

[3 marks] Provide a 95% condence interval for β1 (the regression coefficient for speed).

[3 marks] Conduct the following hypothesis test for the intercept:

H0 : β0 = 0 vs. H1 : β0 0.

[3 marks] Conduct the following hypothesis test:

H0 : β1 = 0.35 vs. H1 : β1 < 0.35.

[3 marks] Based on the regression tables and diagnostic plots, clearly outline which model you prefer: the model in part (a) or the model in part (c).

Question 2 [51 marks]:  Data were collected on a random sample of 30 players from the 2010 World Cup.  We are interested in modelling the response  Time (time a player played in minutes over the World Cup) based on a few covariates:  Shots  (the number of shots attempted), Passes (the number of passes made), and Tackles (the number of tackles made). Some summary statistics and scatter plots can be found on pages 6 and 7 of the R  Output.

a.   An initial multiple linear regression model was t with the covariates Shots  and Passes. The regression summary can be found on page 8 of the R  Output.  The model is labeled mod .q .2 .a. Based on the regression summary ll in the following ANOVA Table.

Df

Sum Sq

Mean Sq

F value

Pr(>F)

Passes

2.431e-13

Shots

Residuals

[33 marks] Compute the values in the table. Note: As rounding errors will accumulate as you derive entries in this table from other values shown in the R output, be careful about rounding intermediate values.

b.   A second multiple linear regression model was t with the covariates Shots, Passes, and Tackles.  The regression summary can be found on page 9 of the R  Output.  The model is labeled mod .q .2 .b.

[3 marks] Provide an interpretation of the relationship between Time and Shots, based on the multiple linear regression.

[3 marks] Based on the maximum values for the covariates,