Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

COMP5122M Data Science

Evaluation practical

Unbalanced classes: Two model example

Two models correctly classify 80% of a balanced population, but Model A generates false positives whereas Model B generates false negatives (see lecture slide for balanced class confusion matrices).

The actual population has a 9:1 (negative:positive) ratio. Calculate the confusion matrix for each model, and the accuracy of each model.

Confusion matrix

We have a dataset of pictures. There are 100 pictures of tables and 300 pictures of chairs. A person is sitting on 10% of the tables and 80% of chairs. We also have a model that classifies a picture as being of a chair if a person is sitting on it, and of a table if no one is sitting on it.

Running the model on the dataset produces the following confusion matrix:

 

Actual

Chair

Table

Predicted

Chair

TP

FP

Table

FN

TN

2.1 What is the value of TP?

2.2 What is the value of FP?

2.3 What is the value of FN?

2.4 What is the value of TN?

2.5 What is the accuracy of the model?

Matrix of probabilities

The matrix of probabilities is:

 

Actual

Chair

Table

Predicted

Chair

p(TP)

p(FP)

Table

p(FN)

p(TN)

3.1 What is the value of p(TP)?

3.2 What is the value of p(FP)?

3.3 What is the value of p(FN)?

3.4 What is the value of p(TN)?

Cost/benefit matrix

I costs £1 to run the model on each picture. You sell all the pictures that the model predicts are chairs, and are paid £5 for each picture that is a chair but fined £10 for each picture that is not.

The cost/benefit matrix is:

 

Actual

Chair

Table

Predicted

Chair

b(TP)

b(FP)

Table

b(FN)

b(TN)

4.1 What is the value of b(TP) in pounds?

4.2 What is the value of b(FP) in pounds?

4.3 What is the value of b(FN) in pounds?

4.4 What is the value of b(TN) in pounds?

Expected profit

What is the total expected profit from all the pictures you sell?

Decision analytic thinking

A charity asks you to help fund-raise, but when you ask about the campaign’s aim you get three different answers:

a) We want as many donors as possible

b) We want to maximise the donations we receive

c) We want the campaign to be as profitable as possible

Explain how you would calculate the cost/benefit matrix for each aim.

Answers

Unbalanced classes: Two model example

Confusion matrix for Model A

 

Actual

Positive

Negative

Predicted

Positive

100

360

Negative

0

540

Therefore accuracy of Model A = 640 / 1000 = 64%

Confusion matrix for Model A

 

Actual

Positive

Negative

Predicted

Positive

60

0

Negative

40

900

Therefore accuracy of Model B = 960 / 1000 = 96%

Confusion matrix

7.1 What is the value of TP?

Answer = 240

7.2 What is the value of FP?

Answer = 10

7.3 What is the value of FN?

Answer = 60

7.4 What is the value of TN?

Answer = 90

7.5 What is the accuracy of the model?

Answer = 0.825

Matrix of probabilities

8.1 What is the value of p(TP)?

Answer = 0.600

8.2 What is the value of p(FP)?

Answer = 0.025

8.3 What is the value of p(FN)?

Answer = 0.150

8.4 What is the value of p(TN)?

Answer = 0.225

Cost/benefit matrix

9.1 What is the value of b(TP) in pounds?

Answer = 4

9.2 What is the value of b(FP) in pounds?

Answer = -11

9.3 What is the value of b(FN) in pounds?

Answer = -1

9.4 What is the value of b(TN) in pounds?

Answer = -1

10 Expected profit

Answer = £700

11 Decision analytic thinking

ANSWERS

For (c) you need to explain that the values in the cost benefit matrix depend on the amount of a typical donation and the cost/person of doing the fund-raising. You would then need to explain how you would use the donation amount and cost/person to calculate the matrix’s values.

For (b) the only thing that matters is the amount of a typical donation. Yes that would be bad business (that’s one reason why the expected value framework is useful – it makes you think about your model and the data science task itself)!

For (a) the only thing that matters is the number of donors, which is even more stupid!!