闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

CMPT 354 Assignment 3

Please submit your assignment in Coursys. All answers should be typed in a PDF file.

Every student has to complete the assignment independently. While you are encouraged to learn through discussion with the instructor, the TAs and the peer students, any plagiarisms are serious violation of the university’s academic integrity policy. We have absolutely zero tolerance of such behavior. Using any tutoring platforms for this course, such as Course Hero or 51zuoyejun is strictly NOT ALLOWED. If detected, a student using this kind of paid services in this course or uploading materials of this course to such platforms will be regarded as dishonest and will be reported to the university as a plagiarism case.

This assignment covers Chapters 8 and 10 and 11 in the textbook.

Question 1 (25 points)

Consider a database schema with a relation Emp whose attributes are as shown below, with types specified for multivalued attributes.

Emp = (ename, ChildrenSet multiset(Children), SkillSet multiset(Skills))

Children = (name, birthday)

Skills = (type, ExamSet setof(Exams))

Exams = (year, city)

Redesign the database into a relational database holding the first normal form and the fourth normal form. List any functional or multivalued dependencies that you assume. Also list all referential-integrity constraints that should be present in the first and fourth normal form schemas.

Question 2 (15 points)

The Google search engine provides a feature whereby web sites can display advertisements supplied by Google. The advertisements supplied are based on the contents of the page. Suggest how Google might choose which advertisements to supply for a page, given the page contents. Can the similarity measures discussed in this course, TF-IDF and cosine similarity, be useful here?

Question 3 (20 points)

Consider tables S (A, B, C) and T (B, C, D) and SQL query

select A, B, C, D

from S, T

where S.B = T.B and S.C = T.C

Design a MapReduce program to compute the join efficiently. Please provide the pseudocode.

Question 4 (20 points)

The map-reduce framework is quite useful for creating inverted indices on a set of documents. An inverted index stores for each word a list of all document IDs that it appears in (offsets in the

documents are also normally stored, but we shall ignore them in this question). For example, if the input document IDs and contents are as follows:

1: data clean

2: data base

3: clean base

then the inverted lists would

data: 1, 2

clean: 1, 3

base: 2, 3

Give pseudocode for map and reduce functions to create inverted indices on a given set of files (each file is a document). Assume the document ID is available using a function context.getDocumentID(), and the map function is invoked once per line of the document. The output inverted list for each word should be a list of document IDs separated by commas. The document IDs are normally sorted, but for the purpose of this question you do not need to bother to sort them.

Question 5 (20 points)

Consider the table sales as follows.

City	Season	Product	Amount
Vancouver	Spring	GoPro 9	100,000
Vancouver	Fall	GoPro 9	80,000
Toronto	Fall	GoPro 8	12,000
Victoria	Spring	GoPro 8	60,000

Suppose we only consider the two-level hierarchy, city – province, in dimension city, and no hierarchy in other dimensions. List all tuples in the relational representation of the data cube on the sales table, that is, the complete relational representation of all cross-tabs.