Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

COMS W 4111-002

W4111 - Introduction to Databases

Section 003/V03, Fall 2022

Take Home Final

Exam Instructions

We will publish instructions on Ed.

Environment Setup and Test

MySQL

Replace  root and  dbuserdbuser for the correct values for you MySQL instance from previous homework assignments and exams.

You will need the sample database (https://www.db-book.com/university-lab-dir/sample_tables-dir/index.html) that comes with the recommended textbook

to execute the setup test.

You should have already installed the database because you need for previous assignments.

I named my database

In  [1]:


%load_ext sql


In  [22]:


%sql mysql+pymysql://root:dbuserdbuser@localhost


In  [3]:


%sql select * from db_book.student


* mysql+pymysql://root:***@localhost

13 rows affected.

Out[3]:

ID

00128

12345

19991

23121

44553

45678

54321

55739

70557

76543

76653

98765

98988

name

Zhang Shankar Brandt Chavez Peltier Levy Williams Sanchez Snow

Brown Aoi

Bourikas

Tanaka

dept_name

Comp. Sci. Comp. Sci. History Finance Physics Physics Comp. Sci. Music Physics Comp. Sci. Elec. Eng. Elec. Eng.

Biology

tot_cred

102

32

80

110

56

46

54

38

0

58

60

98

120

Neo4j

Please set the values for your Neo4j database below.

Make sure that your database is active. If you have not used it for a while, you need to log in through the website and restart the database.

In  [4]:


neo4j_url = "neo4j+s://aae9f847.databases.neo4j.io"

neo4j_user = "neo4j"

neo4j_password = 'g2li6DznTfqReEIIiCgW34LRnJZbRiLS0188LQAcf9Q'


In  [10]:


from py2neo import Graph


In  [11]:


def t1():

graph = Graph(neo4j_url, auth= (neo4j_user, neo4j_password))

q = "match (r:Person) where r.name='Tom Hanks' return r"

res = graph.run(q)

for r in res:

print(r)

Please rerun the following cell.

In  [13]:


t1()


Node('Person', born=1956, name='Tom Hanks')

MongoDB

Please set your URL for MongoDB Atlas and make sure that your cluster is not suspended.

In  [14]:


mongodb_url = "mongodb+srv://dff9:[email protected]/?retryWrites=true&w=majority"


In  [15]:


import pymongo


In  [19]:

def connect():

client = pymongo.MongoClient(

mongodb_url

)

return client

def t_connect():

c = connect()

print("Databases = ", list(c.list_databases()))

In  [20]:


#

# Note, you list of local databases will be different. The values do not matter.

#

t_connect()


Databases =  [{'name': 'GoT', 'sizeOnDisk': 532480, 'empty': False}, {'name': 'GoTR', 'sizeOnDisk': 491520, 'empty': False},  {'name': 'classicmodels', 'sizeOnDisk': 262144, 'empty': False}, {'name': 'cool_db', 'sizeOnDisk': 90112, 'empty': False}, {'n ame': 'hw4', 'sizeOnDisk': 405504, 'empty': False}, {'name': 'hw4recitation', 'sizeOnDisk': 40960, 'empty': False}, {'name':  'mydb', 'sizeOnDisk': 102400, 'empty': False}, {'name': 'mydbd', 'sizeOnDisk': 49152, 'empty': False}, {'name': 's22_final',  'sizeOnDisk': 81920, 'empty': False}, {'name': 'sss', 'sizeOnDisk': 40960, 'empty': False}, {'name': 'testdb', 'sizeOnDisk': 7 3728, 'empty': False}, {'name': 'xc2601', 'sizeOnDisk': 45056, 'empty': False}, {'name': 'xc2601_final', 'sizeOnDisk': 45056, 'empty': False}, {'name': 'zz2870', 'sizeOnDisk': 135168, 'empty': False}, {'name': 'zz2870_final', 'sizeOnDisk': 94208, 'empt y': False}, {'name': 'admin', 'sizeOnDisk': 344064, 'empty': False}, {'name': 'local', 'sizeOnDisk': 17138139136, 'empty': Fal se}]

Written Questions General Knowledge

The written questions require a short, succinct answer.

Remember, "If you can't explain it simply, you don't understand it well enough."

Some questions will research using the web, lecture slides, etc. You cannot cut and paste from sources. Your answer must show that you and understand the concept.

If you use a source other than lecture material, please provide a URL to the source(s) you read.

G1

Question: List at least two reasons why database systems support data manipulation using a declarative query language such as SQL, instead of just providing a library of C or C + + functions to carry out data manipulation.

Answer:

Enter answer.

G2

Question: List four significant differences between:

Processing data by writing programs that manipulate files.

Using a database management system and query language.

Answer:

Enter answer.

G3

Question: List five responsibilities (functionality provided) of a database-management system. For each responsibility, explain the potential problems that would occur with the functionality.

Answer:

Enter answer.

G4

Question: We all use SSOL to choose and register for classes. Another option would be to have a single Google sheet (shared spreadsheet) that we all use to register for classes. What are problems with using a shared spreadsheet?

Answer:

Enter answer.

G5

Question: NoSQL databases have become increasingly popular for supporting applications. List 3 benefits of or reasons for using NoSQL databases versus SQL/relational databases. List 3 benefits of relational databases versus NoSQL databases.

Answer:

Enter answer.

Relational Model

R1

Question: A column in a relation (table) has a type. Consider implementing a  date as  CHAR(10) in the format  YYYY-MM-DD. The lecture material states that attributes (column values) come from a domain. Using  date explain the differenc between a domain and a type.

Answer:

Enter answer.

R2

Question: The domain for a relation (table) attribute (column) should be atomic. Why?

Answer:

Enter answer.

R3

Question: "In the US Postal System, a delivery point is a specific set of digits between 00 and 99 assigned to every address. When combined with the ZIP + 4 code, the delivery point provides a unique identifier for every deliverable address served by the United States Postal Service."

The lecture 2 slides provide a notation for representing a relation's schema. Assume we want to define a relation for US mailing addresses, and that the columns are:

Zip code

+4 code

delivery_point

address_line_1

address_line_2

city

state

Use the notation to define the schema for an address. A simple example of an address's column values might be:

Zip code: 10027

+4 code: 6623

delivery_point: 99

address_line_1: 520 W 120th St

address_line_2: Room 402

city: New York

state: NY

Answer:

Enter answer.

R4

Note: Use the RelaX (https://dbis-uibk.github.io/relax/calc/gist/4f7866c17624ca9dfa85ed2482078be8/relax-silberschatz-english.txt/0) calculator and the

schema associated with the recommended textbook to answer this question. Your answer should contain:

The text for the query.

An image showing the query execution and result.

An example of the format is:

Query

σ capacity >= 50 (classroom)

Execution

Question: Translate the following SQL statement into an equivalent relational algebra statement.

select

*

from

instructor

where

dept_name in (select dept_name from department where budget >= 100000)

Answer:

Enter answer.

R5

Use the same format to answer this question.

Question

Use the following query to compute a new table.

section_and_time =

π course_id, sec_id, semester, year,

day, start_hr, start_min (section ⨝  time_slot)

Using only section_and_time, write a relational algebra expression that returns a relation of overlapping courses of the form

(course_id_1, sec_id_1, semester_1, year_1, course_id_2, sec_id_2, semester_2, year_2) .

Your table cannot container duplicates. For example, a result containing

(BIO-101, 1, fall, 2022, MATH-101, 2, fall, 2022)

(MATH-101, 2, fall, 2022, BIO-101, 1, fall, 2022)

is incorrect.

Answer:

Query

Execution

SQL

You will use the Classic Models tutorial database (https://www.mysqltutorial.org/mysql-sample-database.aspx), which you

MySQL.

should have

already

loaded into

S1

Question: Create a view  employee_customer_sales with the following information:

employeeNumber

employeeLastname

employeeFirstName

customerNumber

customerName

revenue

The employee information is for the employee that is the  customer.customerRepEmployeeNumber .

revenue is the total revenue over all of the customer's orders.

The revenue for an  order is  priceEach*quantityOrdered for each  orderdetails in the order.

Answer:

In  [26]:

* mysql+pymysql://root:***@localhost

0 rows affected.

0 rows affected.

Out[26]:

[]

Test Answer:

In  [27]:

%sql select * from employee_customer_sales ;

* mysql+pymysql://root:***@localhost

98 rows affected.

Out[27]:

employeeNumber

1337

1337

1337

1337

1337

1337

1501

1501

1501

S2

Question:

Below, there is a query that creates a view. Run the query.

Using the view, write a query that produces a table of the form   (productCode, productName) for products that no customer in Asia has ordered. For this questions purposes, the Asian countries are:

Japan

Singapore

Philipines

Hong King

You must not use a JOIN.

In  [29]:

#

# Create the view

#

%sql create or replace view orders_all as \

select * from orders join orderdetails using(orderNumber)

* mysql+pymysql://root:***@localhost

0 rows affected.

Out[29]:

[]

Answer:

In  [63]:

%%sql

* mysql+pymysql://root:***@localhost

S3

Question:

Use the  customers and  orders for this query.

Shipping days is the number of days between  orderDate and  shippedDate.

Product a table of the form:

customerNumber

customerName

noOfOrders is the number of orders the customer placed.

averageShippingDays , which is the average shipping days.

minimumShippingDays , which is the minimum shipping days.

maximumShippingDays , which is the maximum shipping days.

The table should only contain entries where:

localhost:8888/notebooks/Desktop/F22_W4111_003_FInal_Exam.ipynb

noOfOrders >= 3

averageShippingDays >= 5 or  maximumShippingDays >= 10.

Answer:

In  [64]: