闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

2021 Semester 2 Exam

Question 1

2 pts

A medical practice uses a unique consultation code to represent each patient consultation. The code is made up of the following:

- The type of each consultation, represented by a sequence of two or three capital characters from the set A, B, C, D or E

- Two asterisk symbols

- The patient ID, represented by a six digit number

- Another two asterisk symbols

- A four digit number indicating how many times the patient has visited the practice.

For example, the following are all valid patient codes:

ACE**234123**0001

DE**123456**1024

ABC**000111**9999

Write a regular expression that will capture all valid consultation codes. Ensure that the six digit patient ID can be extracted as the first capture group in the expression.

Question 2

3 pts

The following regular expression is designed to be a sentence tokenizer. [\s]*([^.]*)\.

a) Explain how the expression works to tokenize sentences. (1 mark)

b) Suggest two strings for which the expression may not fulfill its intended purpose and explain why. (2 marks)

Question 3

2 pts

Calculate the Sorensen-Dice similarity between the following words using character tri-grams including padding:

drive

drove

Enter the similarity as a numeric value in the box below.

Question 4

2 pts

Match each of the following histograms to one boxplot letter:

Question 5

2 pts

A data scientist is given a JSON file containing the results of football fixtures, similar to the one you encountered in Assignment 1. He wishes to extract data from the file but does not have a library available to read JSON files. As a result, he uses an online 'JSON to CSV converter' tool to produce a CSV file, but his program is not able to parse the resultant file as he expects.

a) Explain the most likely reason why this might be the case (1 mark) b) Suggest another format he could use to represent the data and explain why it would be more suitable than CSV (1 mark)

Question 6

1 pts

Max is having a conversation about data integration. He says “Using blocking for record linkage between two datasets (dataset A and dataset B) is a bad idea. It is too time consuming to assign the records to blocks. It is much better instead to directly compare the records in A against the records in B without using any blocking step” . Argue why Max’s statement is incorrect. (1 mark)

Question 7

4 pts

Business X and Business Y have decided to conduct a joint marketing campaign.

For this marketing campaign, they need to determine how many customers they have in common (how many people are in the customer list of both businesses). They implement the following 2 party privacy preserving protocol, making use of the SHA-256 one way hashing function.