Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

COMP5122M Data Science

Classification practical

The answers are on the last page.

Loan write-off example

 

See lecture for further details about the dataset.

1.1 What is the entropy of the whole dataset?

1.2 Split the dataset

For each variable, what is the resulting entropy?

1.3 Which is the most informative attribute?

Classification diagram

Consider the following figure for answering the next few questions. In the figure, X1 and X2 are the two features and the data point is represented by dots (-1 is negative class and +1 is a positive class). You first split the data based on feature X1(say splitting point is x11) which is shown in the figure using vertical line. Every value less than x11 will be predicted as positive class and greater than x will be predicted as negative class.

 

2.1 How many data points are misclassified in above image?

2.2 What splitting point on feature x1 will classify the data correctly?

Decision tree

Here is a decision tree:

 

 

 

 

 

 

 


Classify records A – E.

Record

Colour

Height

Width

Class

A

Red

Short

Thin

 

B

Blue

Tall

Fat

 

C

Green

Short

Fat

 

D

Green

Tall

Thin

 

E

Blue

Short

Thin

 


Answers

Loan write-off example

3.1 What is the entropy of the whole dataset?

Answer = 0.9799

3.2 Split the dataset

Answers:

Split by head shape = 0.9729

Split by body shape = 0.7842

Split by body colour = 0.9758

3.3 Which is the most informative attribute?

Answer = body shape

Classification diagram

4.1 How many data points are misclassified in above image?

Answer = one (the bottom -1)

4.2 What splitting point on feature x1 will classify the data correctly?

Answer = none!

Decision tree

Record

 

 

 

Class

A

 

 

 

No

B

 

 

 

Yes

C

 

 

 

No

D

 

 

 

Yes

E

 

 

 

No