COMPSCI 3DM3 Winter 2023 Introduction to Data Mining Assignment 2
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
COMPSCI 3DM3 Winter 2023
Introduction to Data Mining
Assignment 2
Total marks: 100 marks (5% of the final grade)
Note:for each of thefollowing questions, you can EITHER derive the answers by hand OR write a computer program in Python to produce the answers if possible.
• If you choose to derive the answers manually, please write down/type your answers.
• If you choose to write a computer program, please submit your program as a zipfile to avenue. The zipfile should include a main.pyfile, which can be directly executed to produce the results (i.e., print the results on screen)for each of the questions in an easy-to- read manner. If the answers are not easy to read or understand, marks will be deducted.
Assignment 2.1 (50 marks)
Given a minimum support threshold of 1, please try to solve the following questions.
a) Show the FP-Tree with the lexicographic ordering a, b, c, d, e, f for the
following database of transactions. (15 marks)
TransID Items
1 a,b,c,d
2 b,c,e,f
3 a,d,e,f
4 a,e,f
5 b,d,f
b) Show the FP-tree for the same database with the reverse lexicographic ordering. (15 marks)
TransID Items
1 d,c,b,a
2 f,e,c,b
3 f,e,d,a
4 f,e,a
5 f,d,b
c) Why is the second FP-tree smaller, i.e. has fewer nodes? (20 marks)
Assignment 2.2 (50 marks)
Assume the following dataset of transactions, where every transaction records the transaction ID, the customerID and the items bought in the form Brand-Category separated by comma.
TransID CustID Items
T100 1 King’s-Crab, Sunset-Milk, Dairyland-Cheese, Best-Bread
T200 2 Best-Cheese, Dairyland-Milk, Goldenfarm-Apple, Tasty-Pie, Wonder-Bread
T300 3 Westcoast-Apple, King’s-Crab, Tasty-Pie, Wonder-Bread
T400 4 Wonder-Bread, Sunset-Milk, Dairyland-Chees
Assume that minimum ‘relative support’ is set to 50% and minimum confidence to 80%.
The relative support of a pattern P is given by the support of P divided by the total number of transactions in the transactional database, that is,
Number of transactions containing P
Total number of transactions
The relative support of P is also called the ‘frequency’ of P.
a) At the level of item categories, find all frequent itemsets, e.g. {Cheese, Apple}. Report these itemsets together with their support. (15 marks)
b) Which of the frequent itemsets under a) are closed, which of them are maximal? (15 marks)
c) Let k be the maximum number for which there is a frequent k-itemset. For all frequent k-itemsets, determine all corresponding association rules (having minimum confidence) and their confidence. (20 marks)
2023-02-17