闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

EXAMINATION

Semester One Final Examinations, 2021

INFS3200 Advanced Database Systems

Question 1. [12 Marks] Distributed Databases

Briefly answer the following questions.

Employee Table:

Eno	Ename	Salary	Dno
202101	Tom	20000	201
202102	Kelly	36000	202
202103	John	24000	203
202104	John	39000	205
202105	Sally	12000	204
202106	Elvin	67000	205

a) [4 Marks] You are given a set of predicates P = {P1, P2, P3}, where P1: Dno=201, P2: 201<Dno<205, P3: Dno=205.

(1) Perform a horizontal fragmentation of the Employee Table by listing the tuples of every fragments.

(2) Does this fragmentation meet the correctness criteria? Briefly explain.

b) [4 Marks] When performing horizontal fragmentation of the Employee Table using minterm predicates, we have a set of predicates P = {P1, P2}, where P1: Dno>203, P2: Dno<203.

(1) How many minterm predicates are possible?

(2) List all possible minterm predicates.

(3) Generate a set of minterm predicates that satisfy the correctness criteria.

c) [2 Marks] Given a distributed database over three sites, S1, S2, and S3 with Fragment

1 of Employee Table on S1, Fragment 2 of Employee Table on S2, and Project Table on S3.

S1: Fragment 1 of Employee Table

Eno	Ename	Salary	Dno
202101	Tom	20000	201
202102	Kelly	36000	202
202103	John	24000	203

S3: Project Table

S2: Fragment 2 of Employee Table

Eno	Ename	Salary	Dno
202104	John	39000	205
202105	Sally	12000	204
202106	Elvin	67000	205

Eno	Pno	Ename	Title
202101	2	Tom	Engineer
202102	7	Kelly	Programmer
202103	7	John	Programmer
202105	2	Sally	Engineer
202106	11	Elvin	Engineer

(1) Write an SQL statement for a database query that is to be issued on site S3: “Find the information about Salary of employees who are doing project with Pno=2” .

(2) Step-by-step, use plain English or SQL Query to describe the process of semi-join operation to be initiated at site S3.

d) [2 Marks] Using horizontal fragmentation of Employee Table as an example, after Employee Table is fragmented correctly, how do you insert a new tuple into Employee

Table with data integrity constraint (e.g., primary-key) maintained? Please briefly describe the process.

Employee Table:

Eno	Ename	Salary	Dno
202101	Tom	20000	201
202102	Kelly	36000	202
202103	John	24000	203
202104	John	39000	205
202105	Sally	12000	204
202106	Elvin	67000	205

Question 2. [8 Marks] Distributed Transaction Management

Briefly answer the following questions.

a) [2 Marks] In the Two-Phase Locking (2PL) protocol, should a transaction request a

lock after it releases one of its locks? When a transaction performs an Insert operation on a table, is it strictly necessary to write-lock on the table?

b) [2 Marks] For the Two-Phase Commit (2PC) protocol, after the coordinator asks cohorts to prepare for Commit operation, if cohorts respond with yes, can the cohorts commit by themselves? When 2PC is used as the commit protocol, explain what will happen when the coordinator site does not receive a unanimous yes?

c) [4 Marks] What is Write-Ahead-Logging (WAL)? How are the REDO, UNDO operations used in a recovery process?

Question 3. [9 Marks] Data Warehouse Design

Briefly answer the following questions.

a) [2 Marks] Briefly discuss the differences between OLAP and OLTP operations.

b) [2 Marks] In data warehouse modelling, both star schema and snowflake schema consist of a fact table and some dimension tables. Describe the differences between star schema and snowflake schema.

c) [1 Mark] When a set of fact tables share the same dimension tables what do we call this phenomena?

d) [2 Marks] Consider a data warehouse schema with 3 dimensions (a, b, c) and a data cube operation is applied on all these dimensions. Please list all possible groups of dimensions (i.e., cuboids) that can be implemented as materialized views.

e) [2 Marks] What is a Pivot Query in data warehouse operations? (Explain this concept in plain English or use examples in SQL statements.)

Question 4. [5 Marks] Data Warehouse Implementation

Briefly answer the following questions.

a) [2 Marks] Suppose that each of the following dimensions contains several levels in the hierarchy. How many materialized cuboids can be created?

Time: [day <month < quarter < year]

Item: [name < brand < type]

Location: [street <city < state < country]

b) [3 Marks] Assume we have three hierarchical dimensions and one measure below.

Time: [day <month < quarter < year]

Item: [name < brand < type]

Location: [street <city < state < country]

Measure: sales

Which of the following cuboids can be used to answer the GROUP_BY query on {quarter, state}? For each of the possible candidate cuboids, give a brief explanation.

Cuboid1: {month, type, city}

Cuboid2: {quarter, type}

Cuboid3: {quarter, type, country}

Cuboid4: {month, state}

Cuboid5: {quarter, city}

Question 5. [10 Marks] Database Integration and Data Linkage

Briefly answer the following questions.

a) [2 Marks] We encounter various challenges in data integration, especially when we need to link records from different datasets. Please explain the meaning of the following challenges respectively.

Schema heterogeneity

Data type heterogeneity

Value heterogeneity

Semantic heterogeneity

b) [2 Marks] Given two strings “University Queensland” and “Queensland University”,

(1) Calculate their similarity using the Jaccard Coefficient (with 3-gram) string-matching technique.

(2) Provide reasons on why the similarity by tokenization (q-gram) is a good approach in this situation.

c) [2 Marks] What is the problem solved by data linkage? Give an example to illustrate how numerical similarity is calculated.

d) [4 Marks] Given two strings, “NAVY” and “NVY”, please use the dynamic programming approach to calculate their edit distance matrix by filling out the following matrix E with all values of E[i,j]: E[1,1], E[1,2], E[1,3], E1,4], E[2,1], E[2,2], E[2,3], E[2,4], E[3,1], E[3,2], E[3,3], E[3,4]. Please also indicate the final result of the edit distance of these two strings.

j=1, 2, 3, 4

		N	A	V	Y
	0	1	2	3	4
N	1
V	2
Y	3

Question 6. [6 Marks] Data Quality Management

Briefly answer the following questions.

a) [2 Marks] Describe the differences between data quality governance and data quality management.

b) [4 Marks] In data quality governance, there are four basic steps. Describe these four steps and briefly explain each one of them.

Question 7. [4 Marks] Data Privacy

Briefly answer the following questions.

a) [2 Marks] Briefly describe the approach of Deferential Privacy for preserving data privacy and its advantages compared with other data privacy preservation approaches.

b) [2 Marks] There are many data privacy attacking methods, list three of them and briefly describe these methods.

Question 8. [6 Marks] Advanced Topics

Briefly answer the following questions.

a) [1 Mark] Describe the meanings of row storage and column storage in database systems.

b) [1 Mark] Row storage vs. column storage, what are their advantages and disadvantages?

c) [1 Mark] For the database aggregation operations why would column storage be the best option to use?

d) [1 Mark] What is the meaning of “Curse of Dimensionality” in data processing?

e) [1 Mark] Define the concept of “Scale-Free Networks” in data networks.

f) [1 Mark] What are the types of each one of the following data networks?