CDS3008 Assignment 2 (Decision Tree)
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
CDS3008 Assignment 2 (Decision Tree)
1. Start SAS Enterprise Miner and open your project called "assignment" (If you don't have this project, please create a new one).
2. Create a new diagram called "DT"
3. Make sure that the "mydata" library is already imported. If not, import it. Make sure that the "Donations" dataset is already imported. If not, import it.
4. Description of the data set: The "Donations" dataset consists of the donation history of a group of individuals. The level of analysis is an individual, so each row includes information about one person. The description of each variable is provided in Table 1. For the purposes of this assignment, we will
focus on determining whether an individual donated or not (i.e. the Target_Binary variable)
5. Please set the correct role and level of each variable as shown in Table 1. To do this, right click on the data source and select "Edit Variables". In the window that opens, set the role and level of each variable by selecting the right role and level from the drop-down menus.
6. Build at least two decision tree models (possibly more) and one alternative model using a different data mining method (e.g., logistic regression, MBR or neural networks). Compare and minimize the misclassification rates using the right data samples.
Write-up:
To complete this assignment, draw upon what you have learned in the associated exercises and create a write-up. Your write-up should include the following sections.
A. Description of the best decision tree model you identified:
1) Provide the list of all nodes from start to finish you used for the best model (i.e., list the nodes that concern only the best model.)
2) Report the misclassification rate value of the model (for the training, validation, and test sets). Also compare the misclassification rate to the "baseline" value (i.e., the naïve rule).
B. Description of the second best decision tree model:
3) Provide the list of all nodes from start to finish you used for the second best model (i.e., list the nodes that concern only the second best model.)
4) Report the "misclassification rate" value of the model (for the training, validation, and test sets). Also compare the misclassification rate to the "baseline" value (i.e., naïve rule).
C. Description of an alternative model:
5) Report the "misclassification rate" value of the model (for the training, validation, and test sets). Also compare the misclassification rate to the "baseline" value (i.e., naïve rule)
D. APPENDIX:
6) The screenshot of the final diagram.
Tips: Strategies for building a good model:
• Use your previous knowledge of the dataset to decide whether to include/exclude variables
• Perform data cleaning (if needed) (example: filter outliers, transform variables, etc.) (Try not to transform the target variable)
• Use principal components (if needed)
• Use model comparison to evaluate multiple models.
Table 1. Description of the Variables Variable
Variable |
Description |
Role |
Level |
AGE |
Age of the person |
Input |
Interval |
CLUSTER |
The cluster that captures the person's socioeconomic status (there are 53 clusters) |
Input |
Nominal |
GENDER |
M=Male, F=Female, U=Unknown |
Input |
Nominal |
HOMEOWNER |
H = Yes, U = Unknown |
Input |
Nominal |
HOMEVALUE |
Median home value (in $1000 's) in the person's neighbourhood |
Input |
Interval |
HOUSEHOLD_INCOME |
Median household income (in $1000's) in the person's neighbourhood |
Input |
Interval |
INCOME |
The income group that captures the person's income (there are 7 income groups) |
Input |
Ordinal |
LAST_GIFT_AMT |
The amount of most recent donation made by this person to the organization |
Input |
Interval |
LIFETIME_GIFT_AMT |
Total amount (in dollars) donated by this person to the organization |
Input |
Interval |
LIFETIME_GIFT_COUNT |
Total number of donations made by this person to the organization |
Input |
Interval |
LIFETIME_PROM |
Total number of promotions sent to this person by the organization |
Input |
Interval |
MONTHS_SINCE_LAST_GIF T |
The number of months since the person made a last donation to the organization |
Input |
Interval |
OTHER_PROM |
Total number of times this person donated lo other organizations |
Input |
Interval |
PHONE |
Whether the person's phone is listed in public directory (1=yes, 0=no) |
Input |
Binary |
URBAN _CITY |
U=Urban, C=City, S=Suburban, T=Town, R=Rural |
Input |
Nominal |
TARGET_BINARY |
Whether the individual donated to the most recent solicitation (1=donated, 0 = did not donate) |
Target |
Binary |
TARGET AMOUNT |
How much the individual donated to the most recent solicitation (in $) |
Rejected |
Interval |
2022-04-13