Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

EXAMINATION

Semester One Final Examinations, 2021

INFS3200 Advanced Database Systems

Question 1. [12 Marks] Distributed Databases

Briefly answer the following questions.

Employee Table:

Eno

Ename

Salary

Dno

202101

Tom

20000

201

202102

Kelly

36000

202

202103

John

24000

203

202104

John

39000

205

202105

Sally

12000

204

202106

Elvin

67000

205

a)  [4 Marks] You are given a set of predicates P = {P1, P2, P3}, where P1: Dno=201, P2: 201<Dno<205, P3: Dno=205.

(1) Perform a horizontal fragmentation of the Employee Table by listing the tuples of every fragments.

(2) Does this fragmentation meet the correctness criteria? Briefly explain.

b)  [4 Marks] When performing horizontal fragmentation of the Employee Table using minterm predicates, we have a set of predicates P = {P1, P2}, where P1: Dno>203, P2: Dno<203.

(1) How many minterm predicates are possible?

(2) List all possible minterm predicates.

(3) Generate a set of minterm predicates that satisfy the correctness criteria.

c)   [2 Marks] Given a distributed database over three sites, S1, S2, and S3 with Fragment

1 of Employee Table on S1, Fragment 2 of Employee Table on S2, and Project Table on S3.

S1:  Fragment 1 of Employee Table

Eno

Ename

Salary

Dno

202101

Tom

20000

201

202102

Kelly

36000

202

202103

John

24000

203

S3: Project Table

S2:  Fragment 2 of Employee Table

Eno

Ename

Salary

Dno

202104

John

39000

205

202105

Sally

12000

204

202106

Elvin

67000

205

Eno

Pno

Ename

Title

202101

2

Tom

Engineer

202102

7

Kelly

Programmer

202103

7

John

Programmer

202105

2

Sally

Engineer

202106

11

Elvin

Engineer

(1) Write an SQL statement for a database query that is to be issued on site S3: “Find the information about Salary of employees who are doing project with Pno=2” .

(2) Step-by-step, use plain English or SQL Query to describe the process of semi-join operation to be initiated at site S3.

d)  [2 Marks] Using horizontal fragmentation of Employee Table as an example, after Employee Table is fragmented correctly, how do you insert a new tuple into Employee

Table with data  integrity constraint  (e.g.,  primary-key)  maintained?  Please  briefly describe the process.

Employee Table:

Eno

Ename

Salary

Dno

202101

Tom

20000

201

202102

Kelly

36000

202

202103

John

24000

203

202104

John

39000

205

202105

Sally

12000

204

202106

Elvin

67000

205

Question 2. [8 Marks] Distributed Transaction Management

Briefly answer the following questions.

a)  [2 Marks] In the Two-Phase Locking (2PL) protocol, should a transaction request a

lock after it releases one of its locks? When a transaction performs an Insert operation on a table, is it strictly necessary to write-lock on the table?

b)  [2  Marks]  For the Two-Phase  Commit  (2PC)  protocol, after the coordinator asks cohorts to prepare for Commit operation, if cohorts respond with yes, can the cohorts commit by themselves? When 2PC is used as the commit protocol, explain what will happen when the coordinator site does not receive a unanimous yes?

c)   [4 Marks] What is Write-Ahead-Logging (WAL)? How are the REDO, UNDO operations used in a recovery process?

Question 3. [9 Marks] Data Warehouse Design

Briefly answer the following questions.

a)  [2 Marks] Briefly discuss the differences between OLAP and OLTP operations.

b)  [2 Marks] In data warehouse modelling, both star schema and snowflake schema consist of a fact table and some dimension tables. Describe the differences between star schema and snowflake schema.

c)   [1 Mark]  When a set of fact tables share the same dimension tables what do we call this phenomena?

d)  [2 Marks] Consider a data warehouse schema with 3 dimensions (a, b, c) and a data cube operation is applied on all these dimensions.  Please list all possible groups of dimensions (i.e., cuboids) that can be implemented as materialized views.

e)  [2 Marks] What is a Pivot Query in data warehouse operations? (Explain this concept in plain English or use examples in SQL statements.)

Question 4. [5 Marks] Data Warehouse Implementation

Briefly answer the following questions.

a)    [2 Marks] Suppose that each of the following dimensions contains several levels in the hierarchy.  How many materialized cuboids can be created?

Time: [day <month < quarter < year]

Item: [name < brand < type]

Location: [street <city < state < country]

b)   [3 Marks] Assume we have three hierarchical dimensions and one measure below.

Time: [day <month < quarter < year]

Item: [name < brand < type]

Location: [street <city < state < country]

Measure: sales

Which of the following cuboids can be used to answer the GROUP_BY query on {quarter, state}? For each of the possible candidate cuboids, give a brief explanation.

Cuboid1: {month, type, city}

Cuboid2: {quarter, type}

Cuboid3: {quarter, type, country}

Cuboid4: {month, state}

Cuboid5: {quarter, city}

Question 5. [10 Marks] Database Integration and Data Linkage

Briefly answer the following questions.

a)  [2 Marks] We encounter various challenges in data integration, especially when we need to  link  records from different datasets.  Please explain the  meaning of the following challenges respectively.

    Schema heterogeneity

    Data type heterogeneity

    Value heterogeneity

    Semantic heterogeneity

b)  [2 Marks] Given two strings University Queensland” and Queensland University”,

(1) Calculate their similarity using the Jaccard Coefficient (with 3-gram) string-matching technique.

(2) Provide reasons on why the similarity by tokenization (q-gram) is a good approach in this situation.

c)   [2 Marks] What is the problem solved by data linkage? Give an example to illustrate how numerical similarity is calculated.

d)  [4 Marks]  Given two strings, “NAVY” and NVY”, please use the dynamic programming approach to calculate their edit distance matrix by filling out the following matrix E with all values of E[i,j]: E[1,1], E[1,2], E[1,3], E1,4], E[2,1], E[2,2], E[2,3], E[2,4], E[3,1], E[3,2], E[3,3], E[3,4]. Please also indicate the final result of the edit distance of these two strings.

j=1, 2, 3, 4

 

 

N

A

V

Y

 

0

1

2

3

4

N

1

 

 

 

 

V

2

 

 

 

 

Y

3

 

 

 

 

Question 6. [6 Marks] Data Quality Management

Briefly answer the following questions.

a)  [2 Marks] Describe the differences between data quality governance and data quality management.

b)  [4 Marks]   In data quality governance, there are four basic steps. Describe these four steps and briefly explain each one of them.

Question 7. [4 Marks] Data Privacy

Briefly answer the following questions.

a)  [2 Marks] Briefly describe the approach of Deferential Privacy for preserving data privacy and its advantages compared with other data privacy preservation approaches.

b)  [2 Marks] There are many data privacy attacking methods, list three of them and briefly describe these methods.

Question 8. [6 Marks] Advanced Topics

Briefly answer the following questions.

a)   [1  Mark]  Describe the meanings of row storage and column storage in database systems.

b)   [1   Mark]   Row  storage  vs.  column  storage,  what  are  their  advantages  and disadvantages?

c)    [1 Mark] For the database aggregation operations why would column storage be the best option to use?

d)   [1 Mark] What is the meaning of Curse of Dimensionality” in data processing?

e)   [1 Mark] Define the concept of Scale-Free Networks” in data networks.

f)    [1 Mark] What are the types of each one of the following data networks?