Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit


PA3 Worksheet

⚠️ The worksheet is an individual assignment, you are not allowed to collaborate with anyone and TAs won’t be able to help on these worksheet questions. (link to the worksheet google doc (https://docs.google.com/document/d/1TJfDOZZKor7yDtl8aR5ekWtuSRg2BZ82SSKoqv9DB-I/edit?usp=sharing)

1. Select all true statements about bloom filters in the box below (4 points):

A. If data A was never inserted to bloom filter, it is guaranteed that find(A) returns false

B. If data A was inserted to bloom filter, it is guaranteed that find(A) returns true

C. To detect malicious URLs, bloom filter is in general better than hash table because of its low false positive rate.

D. For a fixed size bloom filter, using more hash functions can always help lower false positive rate.

E. Assume the bloom filter is using the same hash functions, increasing the size of the bloom filter can always help lower false positive rate.





2. Write C++ code that implements the insert function in a bloom filter that uses two hash functions. You can assume hash1() and hash2() are already implemented. The code below should give you all the member variables and functions you need to complete insert(). (4 points) 


char* table; // the byte table used in bloom filter

unsigned int numByte; // number of byte in the table

BloomFilter(unsigned int numByte): table(new char[numByte]), numByte(numByte) {}

unsigned int hash1(const string& key);

unsigned int hash2(const string& key);


void insert(const string& key) {



}



3. First, build a HCTree based on the frequency of chars in a file: a(3), b(5), c(4), d(26), e(12). You should use the following rules when building your HCTree:

HCNode with smaller count will have higher priority, and if the count is the same, then the HCNode with the larger ASCII value symbol will have higher priority.

When popping two highest priority nodes from PQ, the higher priority node will be the ‘c0’ child of the new parent HCNode.

The symbol of any parent node should be taken from its ‘c0’ child.

Your HCTree drawing won’t be graded, but is required to answer the following question

List the encoding bits of all the symbols (a, b, c, d, e) based on your HCTree (4 points)

a

b

c

d

e



4. Suppose you have a file of size 2 megabyte which contains 32 different 8-bit chars of the same frequency. Using Huffman Encoding, what is the size of the compressed file in megabyte? (without considering header size). Note here we are assuming to encode in bits, not the same as what we implement in PA3 (4 points)



5. Suppose we are using a method of encryption that does as much as possible to hide any patterns of a file, including symbol frequencies. And suppose we also want to compress the same file. In terms of the compression performance (the size of compressed file), is it better to compress the file before encrypting it, or encrypt the file before compressing it? Briefly explain the reason (4 points)