DATA1001 2022T2 ASSIGNMENT 2
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
DATA1001 2022T2
ASSIGNMENT 2
Submission Instructions:
a) Submit your solution in a separate .PDF file and submit it to Moodle. (If you use MS Word. Docx, save as .PDF).
b) For your convenience, if you hand-write your solutions, make sure they are adequately readable in the final pdf file.
c) Name your submission: z1234567- data1001-cse-assignment.pdf
d) On late submissions: Assignments finalized after 22nd July 5:00pm will be penalized at 5% per 24 hours and will not be accepted after 7 days. If you
have a good reason for an extension, make an application online following
theSpecial Consideration Policy.
Notes:
a) Five questions covering major learning outcomes of CS component
b) Deadline: 22nd July (Friday of Week 8) 5:00pm (Sydney Time)
c) 15 marks total, 3 marks per question. Worth 15% of your course mark.
Q1. Binary Search Trees (BST) (3 marks total)
1. (1 marks) Starting from an empty binary search tree, what is the final binary search tree constructed by inserting the following elements in the specified order: {32, 87, 12, 3, 45, 11, 65, 15, 59}.
2. (1 marks) Draw a BST with same elements from (q1. 1) but has a maximum height of 3 instead.
3. (1 marks) Using the tree you built from (q. 1. 1), perform two more operations and draw the resulting BST. See the two operations below.
1) Add new node {14}
2) Remove node {32}
Q2. Decision Trees (3 marks total)
1. (3 marks) Consider the same PlayTennis dataset used in the slides. Suppose someone accidentally made the first split on Humidity instead of Outlook. Given this fixed root node. Complete the remaining decision tree using the ID3 Algorithm.
Notes:
i. For each split made, provide the conditional entropies for each remaining predictor variable and their respective information gains.
ii. You may exclude the attribute ‘Day’ from the dataset.
iii. If two or more predictor variables have the same maximum information gain, you may pick any of the predictor variables to split the decision node on.
Q3. Graph Data (3 marks total)
6
3
8
5
1. (1.5 mark) Give the BFS traversal order starting from node 3.
Provide a step-by-step explanation (what was enqueued/ dequeued, and how the queue changed from the step before).
2. (1.5 mark) Give the DFS traversal order starting from node 7.
Provide a step-by-step explanation (what was pushed/ popped, and how the stack changed from the step before).
Q4. Bloom Filter (3 marks total)
(Imaginary scenario) A company needs to quickly check if a password submitted might be a known weak password. It receives a steam of millions of password checks per minute. To tackle the infinite stream of passwords, they employ a bloom filter. They have a Bloom filter of size m = 16 and k = 3. The hash functions that they employ are publicly listed below: ℎ1, ℎ2, and ℎ3 .
ℎ1 (pwd) = (( cn − 1)) mod 16
ℎ2 (pwd) = (( 26 − cn )) mod 16
ℎ 3 (pwd) = l mod 16
Notes:
i. pwd denotes a single password entry.
ii. l denotes length of the password i.e., the number of characters in the letter.
iii. cn denotes the nth character of a pwd. e.g., for “1q2w3e”, c1 = ′1′, c2 = ′q′, c3 = ′2′, c4 = ′w′, c5 = ′3′, and c6 = ′e′.
iv. If cn is not a number, consult the following table below.
Tasks:
1. (2 mark) Given an empty bloom filter, initialize the bloom filter by inserting the following stream of 7 weak passwords from S1 into the Bloom filter. S1 = {“qwerty”, “12345”, “qwerty123”, “00000”, “iloveyou”, “password”, “1q2w3e”}.
2. (1 mark) With the initialized bloom filter from (q4. 1), give a password that would return a false positive (can be any length).
Q5. K-cores (3 marks total)
The k-core of a graph is the maximal subgraph such that every vertex in the k-core has a degree of at least k. We provide the following graph G with 12 vertices.
1. (1 marks) Draw the maximal 2-core of G.
2. (2 marks) Draw the maximal 3-core. Please try to give a step-by-step explanation ofhow you derived the 3-core from your solution in (q.5. 1).
2022-07-22