STAT 4052 Homework 2
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
STAT 4052
Homework 2
Submission policy
The homework has to be submitted electronically through Canvas before the due date indicated above.
Format
Your solution should be provided in the form of a .pdf file. Therefore, if you prepare your
solutions in Microsoft Word or a similar document processing tools, you are strongly encouraged
to convert your document into a .pdf file before submission. Hard copies will not be accepted.
R code
When answering questions which require R coding, DO NOT include your code in your an- swers. Report only the relevant output and your answers to the questions. All your codes should be well organized and included in the form of an Appendix at the end of the document you submit.
Consider the data collected in the Auto2 .txt file available on Canvas. The dataset contains information regarding gas mileage, horsepower, and other information for 392 vehicles. Here are the first few lines of the data set:
mpg
18
15
18
16
17
15
cylinders
8
8
8
8
8
8
displacement
307
350
318
304
302
429
horsepower
130
165
150
150
140
198
weight acceleration year
70
70
70
70
70
70
origin
1
1
1
1
1
1
Type
coupe
sedan
sedan
sedan
vagon
coupe
Q1 - Assessing multicollinearity.
(a) Produce a scatter plot matrix which includes the variables cylinders, displacement, horsepower, weight, acceleration, year.
(b) Compute the matrix of correlations between the variables using the R function cor.
(c) If we were to fit a model for the gas mileage mpg (your outcome), including cylinders, displacement, horsepower, weight, acceleration, year as covariates, would the results of (a) and (b) provide evidence of potential multicollinearity?
(d) What are the risks we could encounter when adding highly correlated covariates in our model when assessing the fit?
Q2 - Consider a linear multiple regression model where the outcome is mpg and all the other
variables in the dataset Auto2 are used as covariates. How does the design matrix for this model look like? (you can just draw it and take a picture with your phone). Note: for the sake of this HW assignment, you may assume that the variable origin is a quantitative variable.
Q3 - Fit a linear multiple regression model where the outcome is mpg and all the other variables
in the dataset Auto2 are used as covariates using lm. Obtain the following diagnostic plots:
Plot 1 - Scatter plot of the fitted values for the outcome versus the standardized residuals;
Plot 2 - QQ-plot of the standardized residuals;
Plot 3 - Contours plot of Cook’s distance as a function of leverage and standardized residuals.
Looking at Plots 1-3:
(a) List all the assumptions on the errors required to fit a linear regression model via
Maximum Likelihood.
(b) Do the assumptions listed above appear to be satisfied? Justify your answer.
(c) Does Plot 3 reveal the presence of influential observations? Justify your answer.
Q4 - Does the model fitted in Q3 appear to provide a good fit for the data? Justify your answer.
2022-09-25