Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

CDS3008 Assignment 2 (Decision Tree)

1. Start SAS Enterprise Miner and open your project called "assignment" (If you don't have this project, please create a new one).

2. Create a new diagram called "DT"

3. Make sure that the "mydata" library is already imported. If not, import it. Make sure that the "Donations" dataset is already imported. If not, import it.

4. Description of the data set: The "Donations" dataset consists of the donation history of a group of individuals. The level of analysis is an individual, so each row includes information about one person. The description of each variable is provided in Table 1. For the purposes of this assignment, we will

focus on determining whether an individual donated or not (i.e. the Target_Binary variable)

5. Please set the correct role and level of each variable as shown in Table 1. To do this, right click on the data source and select "Edit Variables". In the window that opens, set the role and level of each variable by selecting the right role and level from the drop-down menus.

6. Build at least two decision tree models (possibly more) and one alternative model using a     different data mining method (e.g., logistic regression, MBR or neural networks). Compare and minimize the misclassification rates using the right data samples.

Write-up:

To complete this assignment, draw upon what you have learned in the associated exercises and create a write-up. Your write-up should include the following sections.

A. Description of the best decision tree model you identified:

1)   Provide the list of all nodes from start to finish you used for the best model (i.e., list the nodes that concern only the best model.)

2)   Report the misclassification rate value of the model (for the training, validation, and test sets). Also compare the misclassification rate to the "baseline" value (i.e., the naïve rule).

B. Description of the second best decision tree model:

3)   Provide the list of all nodes from start to finish you used for the second best model (i.e., list the nodes that concern only the second best model.)

4)   Report the "misclassification rate" value of the model (for the training, validation, and test sets). Also compare the misclassification rate to the "baseline" value (i.e., naïve rule).

C. Description of an alternative model:

5)   Report the "misclassification rate" value of the model (for the training, validation, and test sets). Also compare the misclassification rate to the "baseline" value (i.e., naïve rule)

D. APPENDIX:

6)   The screenshot of the final diagram.

 

Tips: Strategies for building a good model:

•     Use your previous knowledge of the dataset to decide whether to include/exclude variables

•     Perform data cleaning (if needed) (example: filter outliers, transform variables, etc.) (Try not to transform the target variable)

•     Use principal components (if needed)

•     Use model comparison to evaluate multiple models.


Table 1. Description of the Variables Variable

 

Variable

Description

Role

Level

AGE

Age of the person

Input

Interval

CLUSTER

The cluster that captures the person's socioeconomic status (there are 53     clusters)

Input

Nominal

GENDER

M=Male, F=Female, U=Unknown

Input

Nominal

HOMEOWNER

H = Yes, U = Unknown

Input

Nominal

HOMEVALUE

Median home value (in $1000 's) in the person's neighbourhood

Input

Interval

HOUSEHOLD_INCOME

Median household income (in $1000's) in the person's neighbourhood

Input

Interval

INCOME

The income group that captures the person's income (there are 7 income groups)

Input

Ordinal

LAST_GIFT_AMT

The amount of most recent donation     made by this person to the organization

Input

Interval

LIFETIME_GIFT_AMT

Total amount (in dollars) donated by this person to the organization

Input

Interval

LIFETIME_GIFT_COUNT

Total number of donations made by this person to the organization

Input

Interval

LIFETIME_PROM

Total number of promotions sent to this person by the organization

Input

Interval

MONTHS_SINCE_LAST_GIF T

The number of months since the person made a last donation to the organization

Input

Interval

OTHER_PROM

Total number of times this person donated lo other organizations

Input

Interval

PHONE

Whether the person's phone is listed in public directory (1=yes, 0=no)

Input

Binary

URBAN _CITY

U=Urban, C=City, S=Suburban, T=Town, R=Rural

Input

Nominal

TARGET_BINARY

Whether the individual donated to the  most recent solicitation (1=donated, 0 = did not donate)

Target

Binary

TARGET AMOUNT

How much the individual donated to the most recent solicitation (in $)

Rejected

Interval