BU.450.760 Assignment 1 – Prediction and model selection in R
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
BU.450.760
Assignment 1: prediction and model selection in R
Consider the dataset “D1.2 Credit card defaults.csv” (described in C1.2). This dataset contains information about credit card consumers, in particular, their default behavior. Correspondingly, the key variable in the dataset is “defaultpaymentnextmonth” (call this variable “y”), a dichotomous variable that indicates whether a customer defaulted on his/her debt. There are 23 other variables that can be used to predict this outcome. For simplicity, we will refer to the set
containing all these variables as “X” .
Using this data, perform the following tasks:
1. [3 points] Generate a random training/validation index that implements a 70/30 split
• Use a random seed of your choice.
2. [7 points] Estimate two logistic specifications that allow you to generate out-of-sample predictions of y. Take the following points into account:
• You choose the variables X that enter each model specification. These variables X can be continuous or categorical. Make sure continuous and categorical variables are entered appropriately into the models.
• Specify model 1 as the simplest of the two. This model must include at least 5 explanatory variables.
• Specify model 2 as the richer/more flexible of the two. Control flexibility through the set of X variables used. Include at least one variable interaction. [An interaction of two variables, x1 and x2, would be x3 = x1*x2.]
3. [5 points] Do any of your models exhibit signs of overfitting? Explain.
Submission guidelines
• Submit via Canvas, 8AM EST on the day of class 2
▪ Late submissions will be penalized
▪ Late corrections will not be accepted
• Note that assignments are automatically checked for similarity—it is ok to discuss with other students, it is not ok to copy
• Submit two files (one submission per individual):
1. Slide Deck (MS Powerpoint or pdf)
In the slide deck, I expect you to present results in an executive way – you need to clearly describe:
• what is the goal (question/problem at hand)
• what you did to achieve the goal (analysis procedures)
• why you did it (rationales behind key steps)
• what you obtained (results)
Use as many slides as you need.
The title page must include your name.
If you have worked/discussed with someone else, please also include their name(s) in a separate line.
2. R script file containing the codes that you used for your analysis.
▪ Include comments in the script to help the TA follow your procedures.
▪ The script file should be understood as a companion: you are encouraged to include screenshots of the command lines (with command line #) in your slide deck to demonstrate your key steps. This way TAs can easily go back and double check that your answer in the ppt are well supported.
2023-03-09