STAT 4052 Homework 4
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
STAT 4052
Homework 4
Submission policy
The homework has to be submitted electronically through Canvas before the due date indicated above.
Format
Your solution should be provided in the form of a .pdf file. Therefore, if you prepare your
solutions in Microsoft Word or a similar document processing tools, you are strongly encouraged
to convert your document into a .pdf file before submission. Hard copies will not be accepted.
R code
When answering questions which require R coding, DO NOT include your code in your an- swers. Report only the relevant output and your answers to the questions. All your codes should be well organized and included in the form of an Appendix at the end of the document you submit.
Reading: Read pages 214-236 of the textbook James G. et al., 2013, An introduction to statistical
learning (you can find the link to it on the syllabus).
Once again, let’s focus on the Auto2 .txt file available on Canvas and which we used also
in Homework 2 and 3. The dataset contains information regarding gas mileage, horsepower, and other information for 392 vehicles. Here are the first few lines of the dataset:
mpg
18
15
18
16
17
15
cylinders
8
8
8
8
8
8
displacement
307
350
318
304
302
429
horsepower
130
165
150
150
140
198
weight acceleration year
70
70
70
70
70
70
origin
1
1
1
1
1
1
Type
coupe
sedan sedan sedan
vagon
coupe
Q1 - As alternative to model selection, we can use regularization methods where all the covariates
available enter the models.
(a) Fit a Ridge Regression model where the tuning parameter λ is chosen via K-folds cross validation with K = 10. To perform you cross-validation procedure use the function cv .glmnet from the glmnet package. Before running the function cv .glmnet make sure you set the seed using set .seed(1234). What are the estimates obtained for the regression coefficients?
(b) Fit a Lasso Regression model where the tuning parameter λ is chosen via K-folds cross validation with K = 10. To perform you cross-validation procedure use the function cv .glmnet from the glmnet package. What are the estimates obtained for the regression coefficients? Before running the function cv .glmnet make sure you set the seed using set .seed(1234).
(c) Repeat (a) and (b) setting λ = 2 for both Ridge and Lasso. Compare these estimates for the regression coefficients with those obtained in (a) and (b), what are the major differences you observe? Are these differences surprising? Justify your answer.
Q2 - Finally, we can implement Principal Components Regression to reduce the space of the
covariates considering linear combinations of them.
(a) Fit a Principal Components Regression model where the number of components is
chosen via Leave-One-Out cross validation. For the sake of this exercise, include all
the variables in the model (also the categorical ones, if any). How many components
should be selected using this approach? Justify your answer.
(Hint: use the validation = "LOO" option in the pcr function from the pls package). (b) Do you think the one obtained in (a) is a reasonable model? Justify your answer. (c) What else could you do to decide which model among those fitted in Q1(a)-(c), Q2-(a)
provides the best fit for our Auto2 dataset?
(d) Provide an intuitive explanation of why, in PCA and PCR, the continuous variables considered should be standardized.
2022-10-08