COMP5122M Data Science Classification practical
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
COMP5122M Data Science
Classification practical
The answers are on the last page.
1 Loan write-off example
See lecture for further details about the dataset.
1.1 What is the entropy of the whole dataset?
1.2 Split the dataset
For each variable, what is the resulting entropy?
1.3 Which is the most informative attribute?
2 Classification diagram
Consider the following figure for answering the next few questions. In the figure, X1 and X2 are the two features and the data point is represented by dots (-1 is negative class and +1 is a positive class). You first split the data based on feature X1(say splitting point is x11) which is shown in the figure using vertical line. Every value less than x11 will be predicted as positive class and greater than x will be predicted as negative class.
2.1 How many data points are misclassified in above image?
2.2 What splitting point on feature x1 will classify the data correctly?
3 Decision tree
Here is a decision tree:
Classify records A – E.
Record |
Colour |
Height |
Width |
Class |
A |
Red |
Short |
Thin |
|
B |
Blue |
Tall |
Fat |
|
C |
Green |
Short |
Fat |
|
D |
Green |
Tall |
Thin |
|
E |
Blue |
Short |
Thin |
|
Answers
1 Loan write-off example
3.1 What is the entropy of the whole dataset?
Answer = 0.9799
3.2 Split the dataset
Answers:
Split by head shape = 0.9729
Split by body shape = 0.7842
Split by body colour = 0.9758
3.3 Which is the most informative attribute?
Answer = body shape
4 Classification diagram
4.1 How many data points are misclassified in above image?
Answer = one (the bottom -1)
4.2 What splitting point on feature x1 will classify the data correctly?
Answer = none!
5 Decision tree
Record |
|
|
|
Class |
A |
|
|
|
No |
B |
|
|
|
Yes |
C |
|
|
|
No |
D |
|
|
|
Yes |
E |
|
|
|
No |
2023-02-01