Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

COMPSCI 3DM3   Winter 2023

Introduction to Data Mining

Assignment 2

Total marks: 100 marks (5% of the final grade)

Note:for each of thefollowing questions, you can EITHER derive the answers by hand OR write a computer program in Python to produce the answers if possible.

•   If you choose to derive the answers manually, please write down/type your answers.

•   If you choose to write a computer program, please submit your program as a zipfile to avenue. The zipfile should include a main.pyfile, which can be directly executed to produce the results (i.e., print the results on screen)for each of the questions in an easy-to- read manner. If the answers are not easy to read or understand, marks will be deducted.

Assignment 2.1 (50 marks)

Given a minimum support threshold of 1, please try to solve the following questions.

a)   Show the FP-Tree with the lexicographic ordering a, b, c, d, e, f for the

following database of transactions. (15 marks)

TransID   Items

1         a,b,c,d

2         b,c,e,f

3         a,d,e,f

4         a,e,f

5         b,d,f

b)  Show the FP-tree for the same database with the reverse lexicographic ordering. (15 marks)

TransID   Items

1         d,c,b,a

2         f,e,c,b

3         f,e,d,a

4         f,e,a

5         f,d,b

c)  Why is the second FP-tree smaller, i.e. has fewer nodes? (20 marks)

Assignment 2.2 (50 marks)

Assume the following dataset of transactions, where every transaction records the transaction ID, the customerID and the items bought in the form Brand-Category separated by comma.

TransID              CustID Items

T100                   1      King’s-Crab, Sunset-Milk, Dairyland-Cheese, Best-Bread

T200                    2      Best-Cheese, Dairyland-Milk, Goldenfarm-Apple, Tasty-Pie, Wonder-Bread

T300                    3       Westcoast-Apple, King’s-Crab, Tasty-Pie, Wonder-Bread

T400                    4       Wonder-Bread, Sunset-Milk, Dairyland-Chees

Assume that minimum relative support’ is set to 50% and minimum confidence to 80%.

The relative support of a pattern P is given by the support of P divided by the total number of transactions in the transactional database, that is,

Number of transactions containing P

Total number of transactions

The relative support of P is also called the frequency’ of P.

a) At the level of item categories, find all frequent itemsets, e.g.  {Cheese, Apple}. Report these itemsets together with their support. (15 marks)

b) Which of the frequent itemsets under a) are closed, which of them are maximal? (15 marks)

c) Let k be the maximum number for which there is a frequent k-itemset. For all frequent k-itemsets, determine all corresponding association rules (having minimum confidence) and their confidence. (20 marks)