Applied Machine Learning in Economics
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Applied Machine Learning in Economics
Question 1 (concept)[8p (=4+2+2)]
Suppose we have a data set with five predictors, X1 = GPA, X2 = IQ, X3 = Level (1 for College and 0 for High School), X4 = Interaction between GPA and IQ, and X5 = Interaction between GPA and Level. The response is starting salary after graduation (in thousands of dollars). Suppose we use least squares to fit the model, and get βˆ0 = 50, βˆ1 = 20, βˆ2 = 0.07, βˆ3 = 35, βˆ4 = 0.01, βˆ5 = -10.
(a) Which answer is correct, and why?
i. For a fixed value of IQ and GPA, high school graduates earn more, on average, than college graduates.
ii. For a fixed value of IQ and GPA, college graduates earn more, on average, than high school graduates.
iii. For a fixed value of IQ and GPA, high school graduates earn more, on average, than college graduates provided that the GPA is high enough.
iv. For a fixed value of IQ and GPA, college graduates earn more, on average, than high school graduates provided that the GPA is high enough.
(b) Predict the salary of a college graduate with IQ of 110 and a GPA of 4.0.
(c) True or false: Since the coefficient for the GPA/IQ interaction term is very small, there is very little evidence of an interaction effect. Justify your answer.
Question 2 (applied)[16p (=0+0+1+0+1+1+3+5+5)]
In this exercise you will create some simulated data and will fit simple linear regression models to it. Make sure to use the command set .seed(1) prior to starting part (a) to ensure consistent and reproducible results.
(a) Using the rnorm() function, create a vector, x, containing 100 observations drawn
from a N (0, 1) distribution. This represents a feature, X .
(b) Using the rnorm() function, create a vector, eps, containing 100 observations drawn
from a N (0, 0.25) distribution – a normal distribution with mean zero and variance 0.25. (To ensure consistency, issue the ?rnorm to check the syntax of the command.)
(c) Using x and eps, generate a vector y according to the model
Y = -1 + 0.5X + e
What is the length of the vector y? What are the values of β0 and β1 in this linear model?
(d) Create a scatterplot displaying the relationship between x and y. Comment on what you observe.
(e) Fit a least squares linear model to predict y using x. Comment on the model obtained. How do βˆ0 and βˆ1 compare to β0 and β1 ?
(f) Display the least squares line on the scatterplot obtained in (d). Draw the popu-
lation regression line on the plot, in a different color. Use the legend() command to create an appropriate legend.
(g) Now fit a polynomial regression model that predicts y using x and x2 . Is there
evidence that the quadratic term improves the model fit? Explain your answer. (Hint: in R, given a predictor X we can create the squared term by entering I(X2 ).)
(h) (concept question - no computation is required) Assume a data set generated
from a simple linear model, for instance, according to the model in (c). Consider the training residual sum of squares (RSS) from fitting the linear regression Y = β0 + β1X + e, and also the training RSS from fitting the quadratic regression Y = β0 + β1X + β2X2 + e. Would we expect one to be lower than the other (if yes, which one will be lower?), would we expect them to be the same, or is there not enough information to tell? Justify your answer.
(i) (concept question - no computation is required) Answer (h) using test rather than training RSS.
Question 3 (applied)[26p (=0+3+3+5+5+3+5+2]
This question should be answered using the Carseats data set. (Hint: the Carseats data set is used in Lab of chapter 3, section 3.6.6 of the ISLR text.)
The Carseats data set is in the package ISLR2, which includes all data sets provided by the textbook. Before starting your data analysis in R, make sure that you attach the ISLR2 package with the command library(). You need to issue this command every
time you invoke R and want to analyze one of the data sets included in ISLR2 package. > library(ISLR2)
The first time you want to use the library, R will complain that the package has not been installed. Rstudio will ask you to install this by clicking on a button. Alternatively, you can install the package manually by issuing the command
> install .packages("ISLR2")
(a) Look at the data using the View() function. Notice that there are some qualitative
variables with two or more levels.
(b) Fit a multiple regression model to predict Sales using US, ShelveLoc, Price,
CompPrice, and the interaction term Price×CompPrice.
(c) Write out the model in equation form, being careful to handle the qualitative vari- ables properly.
In particular, for the predictor ShelveLoc that indicated the quality of the shelving
location, use the command
> contrasts(ShelveLoc)
to see the coding that R uses for the dummy variables and report which category is the baseline category (Hint: see text p120).
(d) Provide an interpretation of each coefficient in the model. Be careful—some of the variables in the model are qualitative!
(e) For which of the predictors can you reject the null hypothesis H0 : βj = 0?
(f) On the basis of your response to the previous question, fit a smaller model that
only uses the predictors for which there is evidence of association with the outcome.
(g) How well do the models in (b) and (f) fit the data? Explain using, primarily, the
RSE and R2 statistics.
(h) Using the model from (f), obtain 95% confidence intervals for the coefficient(s).
2022-09-22