闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

DS 303 Homework 7

Fall 2022

Problem 1: Concept Review

1. Explain in plain language (using limited statistics terminology) why lasso can set some of the regression coeﬃcients to be 0 exactly, while ridge regression cannot. You may include a ﬁgure if that is helpful.

2. Suppose we estimate the regression coeﬃcients in a linear regression model by minimizing yi − 8← − j p21 8jzij．(、) 2 + 入 j p21 8j(2)

for a particular value of 入. For parts (a) through (e), indicate which of i. through v. is correct. Justify your answer.

a. As we increase 入 from 0, the training MSE will:

i. Increase initially, and then eventually start decreasing in an inverted U shape.

ii. Decrease initially, and then eventually start decreasing in an inverted U shape.

iii. Steadily increase.

iv. Steadily decrease.

v. Remain constant.

b. Repeat (a) for test MSE.

c. Repeat (a) for variance.

d. Repeat (a) for (squared) bias.

e. Repeat (a) for irreducible error.

Problem 2: Regularized Regression Models

For this problem, we will continue with the Hitters example from lecture. Our aim is to predict the salary of baseball players based on their career statistics.

a. We will start with a little data cleaning. We’ll also split the data into a training and test set. So that we all get the same results, please use the following code:

library(ISLR2)

Hitters = na .omit(Hitters)

n = nrow(Hitters) #there are 263 observations

x = model .matrix(Salary ~ . ,data=Hitters)[,-1] #19 predictors Y = Hitters$Salary

set .seed(1)

train = sample(1:nrow(x), nrow(x)/2)

test=(-train)

Y .test = Y[test]

b. Fit a ridge regression model. Replicate the example we had in class to obtain the the optimal 入 using 10-fold CV. Present a plot of the cross-validation error as a function of 入. Report that value here and call it 入

c. Naturally, if we had taken a diﬀerent training/test set or a diﬀerent set of folds to carry out cross-validation, our optimal 入 and therefore test error would change. An alternative is to select 入 using the one-standard error rule. The idea is, instead of picking the 入 that produces the smallest CV error, we pick the model whose CV error is within one standard error of the lowest point on the curve you produced in part (b). The intention is to produce a more parimonious model. The glmnet function does all of this hard work for you and we can extract the 入 based on this rule using the following code: cv .out$lambda .1se (assuming your cv .glmnet object is named cv .out). Report your that 入 here and call it 入

d. Fit a lasso regression model. Replicate the example we had in class to obtain the the optimal 入 using 10-fold CV. Report that value here and call it 入m(这=) Also report the optimal 入 using the smallest standard error rule and called it 入

e. You now have 4 values for the tuning parameter:

入 , 入 , 入m(这=);(s)n(s)o , 入

Now evaluate the ridge regression models on your test set using 入 = 入 and 入 = 入 Evaluate the lasso models on your test set using 入m(这=) and 入 Compare the obtained test errors and report them here. Which model performs the best in terms of prediction? Do you have any intuition as to why?

f. Report the coeﬃcient estimates coming from ridge using 入 and 入 and likewise for the lasso models. How do the ridge regression estimates compare to those from the lasso? How do the coeﬃcient estimates from using 入m;n compare to those from the one-standard error rule?

g. If you were to make a recommendation to an upcoming baseball player who wants to make it big in the major leagues, what handful of features would you tell this player to focus on?

Problem 3: Build a predictive model

We will work with the Boston housing data set; it is part of library(ISLR2). Your goal here is to build a predictive model that can predict per capita crime rate. Split your data into a training set and test such that 90% of the observations go into the training set and the remaining 10% go into the test set. Your model building should include the following components:

1. A least-square model with the predictors chosen using a model selection technique of your choice. Explain and justify what technique you have chosen. Call the from this step Model1.

– Check whether or not linearity and constant variance assumptions hold. Are any trans- formations needed for your model?

2. A ridge regression with the optimal 入 chosen using 10-fold cross-validation. Compare your models using the 入 that gives the smallest CV error and the 入 based on the one standard error rule. Call these models Model2a and Model2b, respectively. Report both models.

3. A lasso regression with the optimal 入 chosen using 10-fold cross-validation. Compare your models using the 入 that gives the smallest CV error and the 入 based on the one standard error rule. Call these models Model3a and Model3b, respectively. Report both models.

4. Propose a model (or set of models) that seems to perform well on this dataset. Make sure you are evaluating your model performance using the test set and not using the training error. Report your chosen model(s) here. Does your chosen model involve all of the features in the data set? Why or why not?

Problem 4: Bootstrap

We will continue working with the Boston housing data set.

a. Based on this data set, provide an estimate for the population mean of medv. Call this estimate .

b. Provide an estimate of the standard error of using an analytical formula. Interpret this result.

c. Now the estimate the standard error using the bootstrap. How does this compare to your answer from (b)?

d. Using bootstrap, provide a 95% conﬁdence interval for the mean of medv. Compare it to results using analytical formulas.

e. Based on this data set, provide an estimate med for the median value of medv.

f. We would like to estimate the standard error of med . Since there is no simple formula for computing the standard error of the median, use bootstrap. Comment on your ﬁndings.

g. Based on this data set, provide an estimate ←﹒1, the 10th percentile of medv.

h. Use bootstrap to estimate the standard error of ←﹒1 . Comment on your ﬁndings.

Problem 5: Properties of Bootstrap

a. What is the probability that the ﬁrst bootstrap observation is the jth observation from the original sample? Justify your answer.

b. What is the probability that the ﬁrst bootstrap observation is not the jth observation from the original sample? Justify your answer.

c. What is the probability that the jth observation from the original sample is not in the bootstrap sample?

d. When n = 5, what is the probability that the jth observation is in the bootstrap sample?

e. When n = 100, what is the probability that the jth observation is in the bootstrap sample?

f. When n = 10, 000, what is the probability that the jth observation is in the bootstrap sample?

g. Create a plot (in R) that displays, for each integer value of n from 1 to 100,000, the probability that the jth observation is in the bootstrap sample. Comment on what you observe.

h. Investigate numerically the probability that a bootstrap sample of size n = 100 contains the jth observation. Here j = 5. We repeatedly create bootstrap samples, and each time we record whether or not the fourth observation is contained in the bootstrap sample. You may use the following code:

results <- rep(NA, 10000)

for(i in 1:10000){

results[i] <- sum(sample(1:100, rep=TRUE) == 5) > 0

}

mean(results)

Comment on your ﬁndings.

2022-11-03

Java

物理(Physical)

LINUX

C++

Python