Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Database Development and Design (DTS207TC)

Assessment 002: Individual Coursework

Due: Dec 26th, 2022 @ 17:00

Weight: 50%

Maximum Marks: 100

Overview

This assignment aims to gain experience in understanding the internal functionality of different database management systems, including RDBMS, XML, Object-relational databases, and gain experience in designing data warehouses to perform OLAP operations. The coursework will be assessed for the following learning outcomes:

A. Identify and apply the principles underpinning transaction management within DBMS.

B. Demonstrate an understanding of advanced SQL topics.

C. Illustrate the issues related to Web technologies and DBMS and XML as a semi-structured data representation formalism.

D. Identify the principles underlying object-relational models.

E. State the main concepts in data warehousing and data mining.

Assessment Tasks

Your task is to answer every question by carefully reading the questions. Record your thoughts and assumptions, where necessary, while reporting your answers. There are the following five parts for this coursework corresponding to five learning outcomes of this module:

1. DBMS Indexing

2. Transactions Management

3. XML Data Modeling

4. Object-Relational Database

5. Data Warehousing and OLAP

Marking Criteria

This coursework will be graded out of 50% of the total marks with a maximum of 100 marks. Five (5) parts and twenty (20) marks are available for each part. Marks will be awarded based on the level of correctness of each answer. For example, 100% for fully correct with required explanation/justification, partial marks based on the level of incorrectness or missing required details.

1: DBMS Indexing (20 Marks)

Consider a disk with block size B = 512 bytes. A block pointer is P = 8 bytes long, and a record pointer is PR = 10 bytes long. A file has r = 50,000 STUDENTS records of fixed length. Each record has the following fields: Name (30 bytes), ID (10 bytes), department_code (10 bytes), Address (40 bytes), Phone (10 bytes), and Birth_date (10 bytes). (5*4=20 marks.)

Q1(a)  How many blocks are required to search for a record if the data file is ordered on ID as a key field? Compare with the search performance of unordered file organization.

Q1 (b) Suppose the file is ordered by the key field ID, and we have a primary index on ID. How many blocks are required to search and retrieve a record from the data file given its ID value using the primary index?

Q1(c) Suppose we want to construct a multi-level index. How many index entries and the number of blocks are required for each level of the multi-level index? For example, the number of first-level index entries and the number of first-level index blocks, the number of second-level index entries, and the number of second-level index blocks, and so on. What would be the maximum number of blocks required to search for and retrieve a record from the data file with the multi-level index on the ID field?

Q1(d) Suppose the file is not ordered by the key field ID and we want to construct a secondary index on ID. Repeat the previous exercise (part c) for the secondary index and compare it with the primary index.

Q1(e) Suppose we create a B+ index on ID, and the tree order is 50. What is the maximum capacity of the B+ tree of height 4, and how many blocks accesses are needed to search for and retrieve a record from the file--given its ID value—using the 4-level B+ tree index?

2: Transaction Management (20 Marks)

Consider a database with objects x, y, and z and three concurrent transactions T1, T2, and T3 (given below), which perform read(r) and write(w) operations on database objects. We use the notation from the lecture, where ri (x,y,z) and wi (x,y,z) mean that transaction Ti reads and writes objects x,y,z, respectively.

Transactions

T1

r1(x); r1(y); w1(x)

T2

r2(z); r2(y); w2(y); w2(z);

T3

r3(z); r3(x); w3(x)

Below, two schedules, S1 and S2, are given. Your task is to answer the questions given for each schedule:

S1

T1

T2

T3

 

 

 

r1(x)

 

r1(y)

 

 

w1(x)

 

 

 

 

 

 

 

 

r2(z)

 

 

 

r2(y)

 

 

 

 

 

 

 

w2(y)

w2(z)

 

 

 

 

 

 

r3(z)

 

 

 

r3(x)

w3(x)

 

 

 

 

 

S2

T1

T2

T3

r1(x)

 

 

 

r1(z)

 

 

 

w1(x)

 

r2(z)

 

 

 

r2(y)

 

 

 

w2(z)

 

w2(y)

 

 

r3(x)

 

 

 

r3(y)

 

 

 

w3(y)

 

Q2(a) Consider the schedule S1 to answer the following questions (10 marks):

i. Explain each pair of conflicting statements for schedule S1 and draw serializability (precedence) graphs based on conflicting statements in T1, T2, and T3.  (4 marks)

ii. State whether the schedule is conflict serializable (conflict-free) or not? If not conflict-free state which statement of which transaction is conflicting with other transactions. If the schedule is conflict-free state the reasons (4 marks)  

iii. If the schedule is serializable, write down the equivalent serial schedule. (2 marks)

Q2(b) Consider the schedule S2 to answer the below questions (10 marks):

i. Explain each pair of conflicting statements for schedule S2 and draw serializability (precedence) graphs based on conflicting statements in T1, T2, and T3.  (4 marks)

ii. State whether the schedule is conflict serializable (conflict-free) or not? If not conflict-free state which statement of which transaction is conflicting with other transactions. If the schedule is conflict-free state the reasons (4 marks)  

iii. If the schedule is serializable, write down the equivalent serial schedule. (2 marks)

3: XML Data Modeling (20 Marks)

Consider the below relational schema and answer the below questions.

 

Q3(a) Create an XML schema document that corresponds to the given relational schema. The XSD should be validated using the XML validation tool. (12 marks)

Q3(b) Create an XML instance document corresponding to the XSD you have created in the first step. The XML document must have at least one record for every element specified in the XSD. Validate the instance document against the XSD using the XML validation tool. (8 marks) 

4: Object-Relational Database (20 Marks)

Suppose we want to develop a data model for a database application system for a mid-size insurance company to keep track of patient information, care provider information, information about patient visits to their doctor as well as prescription drugs prescribed to patients for the insurance claim.

Each user of the system is identified by a unique id, a name (first and last), address and date of birth. Each user can either be a patient, doctor, nurse or technician. Information about a patient requires date of visit, visit type, insurance ID and Insurance company as well as his/her primary care doctor. A patient may be associated with a nurse and/or technician information if attended by a nurse or a specific procedure such as X-ray, CT Scan etc. has been carried out. There are different visit types such as initial visit which requires initial diagnosis status; follow-up visit, which requires diagnosis status; Routine check-up, which requires height, weight, blood pressure and diagnosis. Information about doctors requires speciality (medical officer, consultant, general physician etc.,). Information about a primary care nurse requires rank (head nurse, midwife, emergency room nurse etc.,). Information about technicians requires speciality (biochemist, haematologist, pathologist, radiologist etc.,). Furthermore, for each doctor, nurse and technician additional information is required for the hospital ( hospital name, address etc.,) they are affiliated with.

The prescriptions given to each patient by a healthcare provider also need to be tracked in this particular database to determine claim eligibility (refundable or non-refundable) including some basic information on the drug being prescribed to make sure there are no conflicts with a patient’s other prescriptions. We need to know each drug’s name, purpose/use and possible side effects.

Q4(a) Draw a UML class diagram for modeling information for the above system. The diagram must capture all possible attributes, primary keys, and associations for each entity. If necessary make assumptions using the domain knowledge and explain them to justify your answer (10 marks)

Q4(b):  Translate the UML class diagram you have just created in the previous question into a relational schema. Use the most appropriate design choice for subclasses mapping and justify why the selected design choice is the best. Write only the relation names and their columns; for example, PATIENT(ID first_name, last_name, dob; Doctor_ID (FK references DOCTOR); do not write the field types. Underline the key/foreign keys and write FK  constraints as shown in the example for foreign keys. (10 marks) 

5: Data Warehousing (20 Marks)

Suppose a company would like to design a data warehouse to facilitate vehicle analysis in an online analytical processing manner. The company registers huge amounts of vehicles’ data in the format of (Vehicle ID, Driver ID, location, speed, and time). Each Vehicle ID represents a vehicle associated with information, such as vehicle type, plate number, etc., each Driver ID represents information such as ID, name, age etc.,  and each location may be associated with a street in a city. Assume that a street map is available for the city. Speed represents miles/hour and time represents day, month, quarter and year. The company is interested to measure the total number of vehicles at a specific location and time as well as average mileage.  

Q5(a) State which schema (star, snowflake etc.) is the most appropriate to model the above data warehouse. Give your opinion of which might be more empirically useful and state the reasons behind your answer. (6 marks) 

Q5(b) Draw a schema diagram for the above data warehouse using one of the most appropriate schema diagrams. Dimension and fact tables must have all primary keys by introducing the attribute that identifies each entity.  Furthermore, a complete schema diagram must capture associations among all the entities to perform OLAP operations. (8 marks)

Q5(c):   A driver may like to perform online analytical processing to determine how to move fastest from one location to another location at a particular time. Discuss how this can be done efficiently by using data stored in the warehouse. (6 marks)

Submission Guidelines:

Please submit the report in PDF format (‘StudentId’.pdf) via LMO. Any supplementary files should be zipped in one folder and submit separately along with your report.