闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

DS 303 Homework 5

Fall 2022

Problem 1: Bias-Variance Tradeoﬀ

We have learned that the expected test MSE can be decomposed into 3 fundamental quantities. For a given 北0 ,

E(y0 - fˆ(北0))2 = Var(fˆ(北0)) + [Bias(fˆ(北0))]2 + Var(∈).

The notation fˆ(北0) represents our predicted value ( ) at 北0 . Suppose we know Y = f (北) + ∈ . Here f (北) = 4 + 北 + 北2 + 北3 + 北4 + 北5 and ∈ follows a normal distribution with E(∈) = 0 and Var(∈) = σ 2 = 12 .

Run the following code ﬁrst:

set .seed(1) # so we all get the randomly generated data

n = 100

x = runif(n, min = 0, max = 2)

a. Wherever you see a ‘?’, ﬁll in the code to generate 100 observations from the true regression model.

n = 100

error = ?

y = ?

train_set = data .frame(x,y)

b. We want to ﬁt ﬁve models of increasing complexity and analyze their bias-variance trade-oﬀ. In each model, Y is your response and 北 is your predictor. Use all the data you generated in

part (a) to train the following models:

M1 : simple linear regression model

M2 : polynomial regression model with degree 2

M3 : polynomial regression model with degree 3

M4 : polynomial regression model with degree 5

M5 : polynomial regression model with degree 11

You do not need to show any output here but make sure your code shows you have trained each model.

c. What is the true value of E(Y) when x = 0.9?

d. Ideally, to compute the expected test MSE we would have an inﬁnite number of training sets. We can’t do that, so let’s just simulate 1000 training sets. Use the true population regression line to simulate n = 100 new Y values. There is no need to generate new x’s. If you’ve set things up correctly, this will just be repeating code from part (1) 1000 times. For each of these 1000 training set, train M1 - M5 and store the predicted values of Y when x = 0.9. Report the ﬁrst 5 predicted values for each model here.

e. Create a test set of 1000 observations: (x0 , y0 ). For each observation, let x0 = 0.9. Generate y0 using the true regression line when x0 = 0.9. Copy/paste the code you used here.

f. Use your results from above to obtain the (expected) test MSE for each of the ﬁve models. Report the ﬁve test MSEs here. Which model has the smallest test MSE? How does the test.

MSE behave as model ﬂexibility increases? Explain why the test MSE behaves this way.

What is the irreducible error for each of the 5 models? Report a real number.

Problem 2: Concept Review

a. Subset selection will produce a collection of p+1 models M0, M1 , M2 , . . . , Mp . These represent the ‘best’ model of each size (where ‘best’ here is deﬁned as the model with the smallest RSS). Is it true that the model identiﬁed as Mk+1 must contain a subset of the predictors found in Mk? In other words, is it true that if M1 : Y ~ X1, then M2 must also contain X1 . And if M2 contains X1 and X2, then M3 must also contain X1 and X2? Explain your answer.

b. Same question as part (a) but instead of subset selection, we now carry out forward stepwise selection.

c. Suppose we perform subset, forward stepwise, and backward stepwise selection on a single data set. For each approach, again we can obtain p + 1 models containing 0, 1, 2, . . . , p predictors. As we know, best subset will give us a best model with k predictors. Call this Mk,subset . Forward stepwise selection will give us a best model with k predictors. Call this Mk,forward . Backward stepwise selection will give us a best model with k predictors. Call this Mk,backward . Which of these three models has the smallest training MSE? Explain your answer. Hint: Consider the case for k = 0 and k = p ﬁrst. Then the case for k = 1. Then the case for k = 2, . . . , p - 1.

d. Same setup as part (c). Which of these three models has the smallest test MSE? Explain your answer.

e. What advantages are there to using AIC/BIC as our model selection criteria instead of using the test MSE? Explain.

Problem 3: Forward and backward selection

We will use the College data set in the ISLR2 library to predict the number of applications (Apps each university received. Randomly split the data set so that 90% of the data belong to the training set and the remaining 10% belong to the test set. Implement forward and backward selection on the training set only. For each approach, report the best model based on AIC. From these 2 models, pick a ﬁnal model based on their performance on the test set. Report both model’s test MSE and summarize your ﬁnal model.

Problem 4: A Puzzling Problem

When ﬁtting a linear regression model on a data set, you encounter the following R output. You notice there is something strange about the results. Point out what is strange in this output and explain clearly how this could happen.

Call:

lm(formula = y ~ x1 + x2)

Residuals:

Min 1Q Median 3Q Max

-6 .3700 -1 .6364 -0 .1208 1 .4261 5 .2558

Coefficients:

Estimate Std . Error t value Pr(>|t |)

(Intercept) 2 .2936 0 .5217 4 .396 2 .82e-05 ***

x1 1 .2600 2 .3006 0 .548 0 .585

x2 1 .8968 2 .5509 0 .744 0 .459

---

Signif . codes: 0 ‘***’ 0 .001 ‘**’ 0 .01 ‘*’ 0 .05 ‘ . ’ 0 .1 ‘ ’ 1

Residual standard error: 2 .376 on 97 degrees of freedom

Multiple R-squared: 0 .09896,Adjusted R-squared: 0 .08038

F-statistic: 5 .326 on 2 and 97 DF, p-value: 0 .006385

Problem 5: Interaction Terms

We will use the Credit dataset for this problem. It is part of the library ISLR2.

a. This data set contains a few categorical predictors. As we already discussed in lecture, these predictors should be stored as factors so that R can handle them properly. Using the str function, check that all the qualitative predictors in our dataset are stored correctly in R as factors. Copy and paste your output.

b. Fit a model with the response (Y) as credit card balance and X1= Income and X2 = Student as the predictors. Call this model fit. Summarize your output.

c. Based on our results from part (b), write out the ﬁtted model for students and write out the ﬁtted model for non-students.

d. Interpret the regression coeﬃcient related to Income for both models.

e. Notice that our model says that regardless of student status, the eﬀect of Income on average Balance is the same. Do you think this is a reasonable constraint of our model? Construct some plots to back up your answer.

f. One way we could relax this assumption is by incorporating interaction terms into our model. Speciﬁcally:

Yi = β0 + β1Xi1 + β2Xi2 + β3Xi3 + ∈i ,

where X1= Income, X2 = Student, and X3 = Income x Student. Fit a model with an interaction term using the following code:

lm(Balance ~ Income + Student + Income:Student, data=Credit)

Based on this model, write out the ﬁtted model for students and write out the ﬁtted model for non-students.

g. Interpret the regression coeﬃcient related to Income for the ﬁtted models obtained in part (f).

h. The model from part (f) has a signiﬁcant F-test statistic, which tells us the overall model is jointly signiﬁcant and at least one of the regression coeﬃcients is signiﬁcantly diﬀerent from zero. However, the R2 is quite low. Are these results contradictory? Explain.

2022-11-03

Java

物理(Physical)

LINUX

C++

Python

Processing

sas

ios

maths

maple

C语言