闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Applied Machine Learning in Economics

Question 1 (concept)[8p (=4+2+2)]

Suppose we have a data set with ﬁve predictors, X1 = GPA, X2 = IQ, X3 = Level (1 for College and 0 for High School), X4 = Interaction between GPA and IQ, and X5 = Interaction between GPA and Level. The response is starting salary after graduation (in thousands of dollars). Suppose we use least squares to ﬁt the model, and get βˆ0 = 50, βˆ1 = 20, βˆ2 = 0.07, βˆ3 = 35, βˆ4 = 0.01, βˆ5 = -10.

(a) Which answer is correct, and why?

i. For a ﬁxed value of IQ and GPA, high school graduates earn more, on average, than college graduates.

ii. For a ﬁxed value of IQ and GPA, college graduates earn more, on average, than high school graduates.

iii. For a ﬁxed value of IQ and GPA, high school graduates earn more, on average, than college graduates provided that the GPA is high enough.

iv. For a ﬁxed value of IQ and GPA, college graduates earn more, on average, than high school graduates provided that the GPA is high enough.

(b) Predict the salary of a college graduate with IQ of 110 and a GPA of 4.0.

(c) True or false: Since the coeﬃcient for the GPA/IQ interaction term is very small, there is very little evidence of an interaction eﬀect. Justify your answer.

Question 2 (applied)[16p (=0+0+1+0+1+1+3+5+5)]

In this exercise you will create some simulated data and will ﬁt simple linear regression models to it. Make sure to use the command set .seed(1) prior to starting part (a) to ensure consistent and reproducible results.

(a) Using the rnorm() function, create a vector, x, containing 100 observations drawn

from a N (0, 1) distribution. This represents a feature, X .

(b) Using the rnorm() function, create a vector, eps, containing 100 observations drawn

from a N (0, 0.25) distribution – a normal distribution with mean zero and variance 0.25. (To ensure consistency, issue the ?rnorm to check the syntax of the command.)

Y = -1 + 0.5X + e

What is the length of the vector y? What are the values of β0 and β1 in this linear model?

(d) Create a scatterplot displaying the relationship between x and y. Comment on what you observe.

(e) Fit a least squares linear model to predict y using x. Comment on the model obtained. How do βˆ0 and βˆ1 compare to β0 and β1 ?

(f) Display the least squares line on the scatterplot obtained in (d). Draw the popu-

lation regression line on the plot, in a diﬀerent color. Use the legend() command to create an appropriate legend.

(g) Now ﬁt a polynomial regression model that predicts y using x and x2 . Is there

evidence that the quadratic term improves the model ﬁt? Explain your answer. (Hint: in R, given a predictor X we can create the squared term by entering I(X2 ).)

(h) (concept question - no computation is required) Assume a data set generated

from a simple linear model, for instance, according to the model in (c). Consider the training residual sum of squares (RSS) from ﬁtting the linear regression Y = β0 + β1X + e, and also the training RSS from ﬁtting the quadratic regression Y = β0 + β1X + β2X2 + e. Would we expect one to be lower than the other (if yes, which one will be lower?), would we expect them to be the same, or is there not enough information to tell? Justify your answer.

(i) (concept question - no computation is required) Answer (h) using test rather than training RSS.

Question 3 (applied)[26p (=0+3+3+5+5+3+5+2]

This question should be answered using the Carseats data set. (Hint: the Carseats data set is used in Lab of chapter 3, section 3.6.6 of the ISLR text.)

The Carseats data set is in the package ISLR2, which includes all data sets provided by the textbook. Before starting your data analysis in R, make sure that you attach the ISLR2 package with the command library(). You need to issue this command every

time you invoke R and want to analyze one of the data sets included in ISLR2 package. > library(ISLR2)

The ﬁrst time you want to use the library, R will complain that the package has not been installed. Rstudio will ask you to install this by clicking on a button. Alternatively, you can install the package manually by issuing the command

> install .packages("ISLR2")

(a) Look at the data using the View() function. Notice that there are some qualitative

variables with two or more levels.

(b) Fit a multiple regression model to predict Sales using US, ShelveLoc, Price,

CompPrice, and the interaction term Price×CompPrice.

In particular, for the predictor ShelveLoc that indicated the quality of the shelving

location, use the command

> contrasts(ShelveLoc)

to see the coding that R uses for the dummy variables and report which category is the baseline category (Hint: see text p120).

(d) Provide an interpretation of each coeﬃcient in the model. Be careful—some of the variables in the model are qualitative!

(e) For which of the predictors can you reject the null hypothesis H0 : βj = 0?

(f) On the basis of your response to the previous question, ﬁt a smaller model that