关键词 > STAT2008/4038/6038
STAT2008/4038/6038 Regression Modelling Semester 1 - End of Semester, 2018
发布时间:2023-06-06
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Research School of Finance, Actuarial Studies and Statistics
Examinatation
Semester 1 - End of Semester, 2018
STAT2008/4038/6038 Regression Modelling
Question 1 [54 marks]: An investigation was conducted in the 1920s to determine the relation- ship between speed (mph) and the stopping distance (ft) of n = 50 cars.
a. Consider the simple linear regression model of the following form:
Yi = β0 + β1 xi + ei , ei normal(0, σ2 ),
where y = stopping distance and x = speed. Based on the summary statistics of the data, which are on page 1 of the R Output, we are interested in the 6 missing values in the following regression summary table.
> mod .q1 .a <- lm(dist ~ speed, data=cars)
> sumary(mod .q1 .a)
Estimate Std . Error t value Pr(>|t|)
(Intercept) ??????? 6 .75844 ??????? 0 .01232
speed ??????? ??????? ??????? 1 .49e-12
n = 50, p = 2, Residual SE = 15 .37959, R-Squared = ????
[18 marks] Compute the missing values. |
b. Residual plots for the model in part (a) are shown on pages 2 and 3 of the R Output. We are interested in whether these plots suggest any problems with the underlying assumptions.
[3 marks] Are there any problem(s) indicated on the “Residuals vs Fitted” plot on page 2? If so describe the problem(s): |
[3 marks] Are there any problem(s) indicated on the “Normal Q-Q” plot on page 3? If so describe the problem(s): |
[3 marks] Are there any problem(s) indicated on the “Cook’s distance” plot on page 3? If so describe the problem(s): |
[3 marks] What is your overall assessment? (Select just ONE of the following options.)
口 Residuals are not independent (obvious pattern)
口 Residuals do not have constant variance (heteroscedasticity)
口 Residuals are not normally distributed
口 There are possible outliers and/or influential observations
口 More than one of the above problems
口 No obvious problems
c. The response Y was transformed by taking the square-root. A linear regression model was then fit (mod .q1 .c). The regression table is presented on page 4 of the R Output. Additionally, on pages 4 and 5, residual diagnostic plots are presented.
[3 marks] Provide an interpretation of the relationship between speed and dist (1/2) based on the linear regression model. |
[3 marks] Using the min, median, and max values for speed on page 1 of the R Output, estimate the relationship between speed and distance on the original scale for distance. |
[6 marks] Using the min, median, and max values for speed on page 1 of the R Output, estimate the 95% prediction intervals for distance at a given speed (on the original scale). Discuss the interval. |
[3 marks] Provide a 95% confidence interval for β1 (the regression coefficient for speed). |
[3 marks] Conduct the following hypothesis test for the intercept:
H0 : β0 = 0 vs. H1 : β0 |
[3 marks] Conduct the following hypothesis test:
H0 : β1 = 0.35 vs. H1 : β1 < 0.35. |
[3 marks] Based on the regression tables and diagnostic plots, clearly outline which model you prefer: the model in part (a) or the model in part (c). |
Question 2 [51 marks]: Data were collected on a random sample of 30 players from the 2010 World Cup. We are interested in modelling the response Time (time a player played in minutes over the World Cup) based on a few covariates: Shots (the number of shots attempted), Passes (the number of passes made), and Tackles (the number of tackles made). Some summary statistics and scatter plots can be found on pages 6 and 7 of the R Output.
a. An initial multiple linear regression model was fit with the covariates Shots and Passes. The regression summary can be found on page 8 of the R Output. The model is labeled mod .q .2 .a. Based on the regression summary fill in the following ANOVA Table.
|
Df |
Sum Sq |
Mean Sq |
F value |
Pr(>F) |
Passes |
|
|
|
|
2.431e-13 |
Shots |
|
|
|
|
|
Residuals |
|
|
|
|
|
[33 marks] Compute the values in the table. Note: As rounding errors will accumulate as you derive entries in this table from other values shown in the R output, be careful about rounding intermediate values. |
b. A second multiple linear regression model was fit with the covariates Shots, Passes, and Tackles. The regression summary can be found on page 9 of the R Output. The model is labeled mod .q .2 .b.
[3 marks] Provide an interpretation of the relationship between Time and Shots, based on the multiple linear regression. |
[3 marks] Based on the maximum values for the covariates, |