Statistical Modeling and Decision Making Individual Assignment 2
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Individual Assignment 2
Statistical Modeling and Decision Making
Instructions
Show your work on all questions to receive full credit. Please upload a pdf file containing your answers on Canvas. You may hand-write your assignment or type it (e.g., LATEX, Word) and may use R Markdown. Note that points will be deducted if your assignment is inintelligible.
Problem 1: Analyzing credit card spending
Let’s get some practice with linear regression by using a regression model to predict how much people spend on their credit cards per month as a function of income and other background information. The R package AER provides data on 1,319 credit card applicants; you can load the data by calling data("Credit Card")
with the AER package loaded in. Call ?CreditCard to see more information on what each column in the data means.
#load in any packages here
• Question 1 (10 points): Read in the data and remove observations with zero expenditure. Plot a histogram of expenditure and a histogram of log-transformed expenditure.
#put code for Q1 here
• Question 2 (15 points): Estimate a linear regression model of expenditure on all other columns in the data, except share and card. Report the coefficient estimates and their associated standard errors. Which covariates are significantly related with expenditure? Explain how to interpret the coefficient on income.
#put code for Q2 here
• Question 3 (15 points): Make a scatterplot with the predicted (“fitted”) values y(ˆ) =β(ˆ)′ X(⃗) on the x-axis
and the residuals from the regression on the y-axis. Additionally, plot a histogram of the residuals of the regression. Visually, does it look like the assumptions of the classic linear model (normality of errors, homoskedasticity, and linearity) are satisfied?
Hint: You can access the fitted values and residuals from a linear model object in R by using $ subsetting (e.g. model$fitted and model$residuals).
#put code for Q3 here
• Question 4 (20 points): Repeat Questions 2 and 3, but log-transforming both expenditure and income. Does the transformation seem to help with violations of the model assumptions? Explain how the transformation changes the interpretation of the income coefficient.
#put code for Q4 here
Problem 2: Personality traits
Let’s get some practice with principal components analysis by analyzing the PersonalityData.csv dataset containing a set of responses from a personality questionnaire. Individuals were asked how they rank themselves along a set of personality traits (e.g., thoroughness). The dataset is available on Canvas.
#load in any packages here
• Question 1 (5 points): Using the eigenvalue (of the covariance matrix) criterion, how many factors should be retained?
#put code for Q1 here
• Question 2 (10 points): All remaining questions will use the correlation matrix. Using the eigenvalue
criterion, how many factors should be retained?
Note that eigenvalue = sum of squared loadings.
#put code for Q2 here
• Question 3 (10 points): What is the cumulative variance explained if you retain 2 factors? How about 4 factors?
#put code for Q3 here
• Question 4 (5 points): What is the eigenvalue associated with PC3?
#put code for Q4 here
• Question 5 (10 points): Run factor analysis for n=3 factors with a Varimax rotation. Examine the loadings. How do you interpret each factor? Based on your interpretation, describe subject 1.
#put code for Q5 here
2023-09-02