FINA5270 Assignment 2
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
FINA5270
Assignment 2
Part 1: Loan Default Data
(10 pts)Question 1: Load and preprocess the data. Use the MinMax scaler to
convert data features to range 0 to 1 and transform string categorical features to
numerical or dummy features.
In [ ]:
(10 pts)Question 2: Create train and test set, using the train_test_split function
and create 80% of training set and 20% of test set. Fix the random seed.¶
In [ ]:
(20 pts)Question 3: Create and train four models -
i) Logistic regression
ii) Random Forest classifier
iii) Extreme Gradient Boosting (XGB) classifier
iv) Deep Neural Network with 3 hidden layers (With L2 regularizer and dropout layer)
In [ ]:
(10 pts)Question 4: Create a Confusion Matrix for each model and compare -
which one has the best performance based on the F1 score?
In [ ]:
Part 2: The Housing Price in California
The data pertains to the houses found in a given California district and some summary stats.
202(280(/2)6p(0)s(:1)5)Question 1: Data preprocess(FI)n(A)g(52)70_Assignment_2 - Jupyter Notebook
i) Convert non-numerical data to numerical data (categories)
ii) Normalize the data to standard-normal distribution for each feature except for the target feature ('medianhousevalue')
iii) Separate the data into training (70%) and test sets (30%)
In [ ]:
In [ ]:
(10 pts)Question 2: Create a feature that combines "longitude" and "latitude"
(Hint: Turn them into strings, combine the strings and convert it into
categories)
In [ ]:
(20 pts)Question 3: Create and train four models -
i) Linear Regressor
ii) Random Forest regressor
iii) Extreme Gradient Boosting (XGB) regressor
iv) Deep Neural Network with 3 hidden layers (With L2 regularizer and dropout layer)
Show the respective "MSE" for test set and compare the performance. Which model performs best in this case?
In [ ]:
(10 pts)Question 4: Display the training and validation (set it to be 20% of the tarining set) MSE
throughout the epochs in a graph
In [ ]:
2022-08-27