1. ISLR 7.9.1.
2. ISLR 7.9.5.
3. ISLR 7.9.11.

4. We will use the 100K MovieLens data from the previous homework in this question with popularity as the only predictor and ratings as the response. For the ith rating yi, define xi to be the popularity of the movie rated.

(a) Split the data set into a training set and a test set (50-50 split).

(b) Use the smooth.spline() function to fit a smoothing spline regression model and select the smoothness by crossvalidation. Explain the smoothing spline regression model in this context.

Briefly explain how is the smoothness estimated?

(c) Use the poly() function to fit a polynomial regression to predict y using x for degrees of freedom (df) 20 and 5. Report the regression output, and plot the polynomial fits. Contrast both fits with the fit in part (b).

(d) Use the bs() function to fit a regression spline to predict y using x for df’s 20 and 5. Report the output for the fit, and plot the resulting fit. Contrast both fits with the fit in part (b).

(e) Use the smooth.spline() function to fit a regression spline to predict y using x for df’s 20 and 5. Report the output for the fit, and plot the resulting fit. Contrast both fits with the fit in part (b).

(f) Use the ns() function to fit a natural spline to predict y using x for df’s 20 and 5. Report the output for the fit, and plot the resulting fit. Contrast both fits with the fit in part (b).

(g) Compare the methods in parts (c)–(f) on test data using (b) as the benchmark method.

(h) Modify the response y into two categories 0 and 1. If yi > 3, then define y~i = 1 and y~i = otherwise. Fit a logistic regression model for predicting y using x based on the
i. poly() function with df 10;
ii. bs() function with df 10;
iii. smooth.spline() function with df 10;
iv. ns() function with df 10; and
v. choose the best method from i–iv using the test data and explain your conclusion.