STAT 4052 Homework 3
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
STAT 4052
Homework 3
Submission policy
The homework has to be submitted electronically through Canvas before the due date indicated above.
Format
Your solution should be provided in the form of a .pdf file. Therefore, if you prepare your
solutions in Microsoft Word or a similar document processing tools, you are strongly encouraged
to convert your document into a .pdf file before submission. Hard copies will not be accepted.
R code
When answering questions which require R coding, DO NOT include your code in your an- swers. Report only the relevant output and your answers to the questions. All your codes should be well organized and included in the form of an Appendix at the end of the document you submit.
Reading: Read pages 203-214 of the textbook James G. et al., 2013, An introduction to statistical
learning (you can find the link to it on the syllabus).
From now onward we will work with the Auto2 .txt file available on Canvas and which we used
also in Homework 2. The dataset contains information regarding gas mileage, horsepower, and other information for 392 vehicles. Here are the first few lines of the dataset:
mpg
18
15
18
16
17
15
cylinders
8
8
8
8
8
8
displacement
307
350
318
304
302
429
horsepower
130
165
150
150
140
198
weight acceleration year
70
70
70
70
70
70
origin
1
1
1
1
1
1
Type
coupe
sedan
sedan
sedan
vagon
coupe
Q1 - Fit a linear multiple regression model where the outcome is mpg and all the other variables
in the dataset Auto2 are used as covariates apart from Type.
(a) By looking at the output of both the models with and without Type. Which model
fits the data better? Justify your answer.
(b) Compare the two models with an F-test. Should Type be included in the model?
Justify your answer.
Q2 - For the model with all the variables (including Type), implement the Box-Cox method to
assess if a transformation of the outcome could lead to a better fit.
(a) How would you specify the model suggested by the Box-Cox method? (write down the formula for it in terms of Y and X ).
(b) What problem could you encounter when using this model in practice?
Q3 - On pages 16 and 17 of Handout 3 you can find the steps necessary to implement the Best
Subset Selection and Stepwise Forward Selection algorithms to perform variables selection. These algorithms allow you to select K covariates out of an original set of p covariates. Here you are asked to write the steps necessary to implement the Stepwise Backward Selection algorithm to select k covariates out of the p covariates available.
(Hint: The starting model is Mp, i.e., the model which includes all the covariates available in the dataset.)
Q4 - The goal here is to select an adequate set of covariates to predict mpg. We are considering
all the remaining variables in the dataset Auto2 as potential covariates.
(a) What is the value of p here? (Recall that p is the total number of potential covariates from which we are selecting the k covariates to be included in our final model).
(b) Perform variables selection via Best Subset Selection. When choosing the adjusted R2 as selection criterion, what is the the final model selected by this algorithm?
(c) Perform variables selection via Stepwise Forward Selection. When choosing the adjusted R2 as selection criterion, what is the the final model selected by this algorithm?
(d) Are there any differences/similarities between the models selected in (b) and (c)? Are those differences/similarities surprising? Justify your answer.
2022-10-03