COMS W 4111-002 W4111 - Introduction to Databases Section 003/V03, Fall 2022
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
COMS W 4111-002
W4111 - Introduction to Databases
Section 003/V03, Fall 2022
Take Home Final
Exam Instructions
We will publish instructions on Ed.
Environment Setup and Test
MySQL
Replace root and dbuserdbuser for the correct values for you MySQL instance from previous homework assignments and exams.
You will need the sample database (https://www.db-book.com/university-lab-dir/sample_tables-dir/index.html) that comes with the recommended textbook
to execute the setup test.
You should have already installed the database because you need for previous assignments.
I named my database
In [1]:
%load_ext sql
In [22]:
%sql mysql+pymysql://root:dbuserdbuser@localhost
In [3]:
%sql select * from db_book.student
* mysql+pymysql://root:***@localhost
13 rows affected.
Out[3]:
ID 00128 12345 19991 23121 44553 45678 54321 55739 70557 76543 76653 98765 98988 |
name Zhang Shankar Brandt Chavez Peltier Levy Williams Sanchez Snow Brown Aoi Bourikas Tanaka |
dept_name Comp. Sci. Comp. Sci. History Finance Physics Physics Comp. Sci. Music Physics Comp. Sci. Elec. Eng. Elec. Eng. Biology |
tot_cred 102 32 80 110 56 46 54 38 0 58 60 98 120 |
Neo4j
Please set the values for your Neo4j database below.
Make sure that your database is active. If you have not used it for a while, you need to log in through the website and restart the database.
In [4]:
neo4j_url = "neo4j+s://aae9f847.databases.neo4j.io"
neo4j_user = "neo4j"
neo4j_password = 'g2li6DznTfqReEIIiCgW34LRnJZbRiLS0188LQAcf9Q'
In [10]:
from py2neo import Graph
In [11]:
def t1(): graph = Graph(neo4j_url, auth= (neo4j_user, neo4j_password)) q = "match (r:Person) where r.name='Tom Hanks' return r" res = graph.run(q) for r in res: print(r) |
Please rerun the following cell.
In [13]:
t1()
Node('Person', born=1956, name='Tom Hanks')
MongoDB
Please set your URL for MongoDB Atlas and make sure that your cluster is not suspended.
In [14]:
mongodb_url = "mongodb+srv://dff9:[email protected]/?retryWrites=true&w=majority"
In [15]:
import pymongo
In [19]:
def connect(): client = pymongo.MongoClient( mongodb_url ) return client
def t_connect(): c = connect() print("Databases = ", list(c.list_databases())) |
In [20]:
#
# Note, you list of local databases will be different. The values do not matter.
#
t_connect()
Databases = [{'name': 'GoT', 'sizeOnDisk': 532480, 'empty': False}, {'name': 'GoTR', 'sizeOnDisk': 491520, 'empty': False}, {'name': 'classicmodels', 'sizeOnDisk': 262144, 'empty': False}, {'name': 'cool_db', 'sizeOnDisk': 90112, 'empty': False}, {'n ame': 'hw4', 'sizeOnDisk': 405504, 'empty': False}, {'name': 'hw4recitation', 'sizeOnDisk': 40960, 'empty': False}, {'name': 'mydb', 'sizeOnDisk': 102400, 'empty': False}, {'name': 'mydbd', 'sizeOnDisk': 49152, 'empty': False}, {'name': 's22_final', 'sizeOnDisk': 81920, 'empty': False}, {'name': 'sss', 'sizeOnDisk': 40960, 'empty': False}, {'name': 'testdb', 'sizeOnDisk': 7 3728, 'empty': False}, {'name': 'xc2601', 'sizeOnDisk': 45056, 'empty': False}, {'name': 'xc2601_final', 'sizeOnDisk': 45056, 'empty': False}, {'name': 'zz2870', 'sizeOnDisk': 135168, 'empty': False}, {'name': 'zz2870_final', 'sizeOnDisk': 94208, 'empt y': False}, {'name': 'admin', 'sizeOnDisk': 344064, 'empty': False}, {'name': 'local', 'sizeOnDisk': 17138139136, 'empty': Fal se}]
Written Questions − General Knowledge
The written questions require a short, succinct answer.
Remember, "If you can't explain it simply, you don't understand it well enough."
Some questions will research using the web, lecture slides, etc. You cannot cut and paste from sources. Your answer must show that you and understand the concept.
If you use a source other than lecture material, please provide a URL to the source(s) you read.
G1
Question: List at least two reasons why database systems support data manipulation using a declarative query language such as SQL, instead of just providing a library of C or C + + functions to carry out data manipulation.
Answer:
Enter answer.
G2
Question: List four significant differences between:
Processing data by writing programs that manipulate files.
Using a database management system and query language.
Answer:
Enter answer.
G3
Question: List five responsibilities (functionality provided) of a database-management system. For each responsibility, explain the potential problems that would occur with the functionality.
Answer:
Enter answer.
G4
Question: We all use SSOL to choose and register for classes. Another option would be to have a single Google sheet (shared spreadsheet) that we all use to register for classes. What are problems with using a shared spreadsheet?
Answer:
Enter answer.
G5
Question: NoSQL databases have become increasingly popular for supporting applications. List 3 benefits of or reasons for using NoSQL databases versus SQL/relational databases. List 3 benefits of relational databases versus NoSQL databases.
Answer:
Enter answer.
Relational Model
R1
Question: A column in a relation (table) has a type. Consider implementing a date as CHAR(10) in the format YYYY-MM-DD. The lecture material states that attributes (column values) come from a domain. Using date explain the differenc between a domain and a type.
Answer:
Enter answer.
R2
Question: The domain for a relation (table) attribute (column) should be atomic. Why?
Answer:
Enter answer.
R3
Question: "In the US Postal System, a delivery point is a specific set of digits between 00 and 99 assigned to every address. When combined with the ZIP + 4 code, the delivery point provides a unique identifier for every deliverable address served by the United States Postal Service."
The lecture 2 slides provide a notation for representing a relation's schema. Assume we want to define a relation for US mailing addresses, and that the columns are:
Zip code
+4 code
delivery_point
address_line_1
address_line_2
city
state
Use the notation to define the schema for an address. A simple example of an address's column values might be:
Zip code: 10027
+4 code: 6623
delivery_point: 99
address_line_1: 520 W 120th St
address_line_2: Room 402
city: New York
state: NY
Answer:
Enter answer.
R4
Note: Use the RelaX (https://dbis-uibk.github.io/relax/calc/gist/4f7866c17624ca9dfa85ed2482078be8/relax-silberschatz-english.txt/0) calculator and the
schema associated with the recommended textbook to answer this question. Your answer should contain:
The text for the query.
An image showing the query execution and result.
An example of the format is:
Query
σ capacity >= 50 (classroom)
Execution
Question: Translate the following SQL statement into an equivalent relational algebra statement.
select
*
from
instructor
where
dept_name in (select dept_name from department where budget >= 100000)
Answer:
Enter answer.
R5
Use the same format to answer this question.
Question
Use the following query to compute a new table.
section_and_time =
π course_id, sec_id, semester, year,
day, start_hr, start_min (section ⨝ time_slot)
Using only section_and_time, write a relational algebra expression that returns a relation of overlapping courses of the form
(course_id_1, sec_id_1, semester_1, year_1, course_id_2, sec_id_2, semester_2, year_2) .
Your table cannot container duplicates. For example, a result containing
(BIO-101, 1, fall, 2022, MATH-101, 2, fall, 2022)
(MATH-101, 2, fall, 2022, BIO-101, 1, fall, 2022)
is incorrect.
Answer:
Query
Execution
SQL
You will use the Classic Models tutorial database (https://www.mysqltutorial.org/mysql-sample-database.aspx), which you MySQL. |
should have |
already |
loaded into |
S1
Question: Create a view employee_customer_sales with the following information:
employeeNumber
employeeLastname
employeeFirstName
customerNumber
customerName
revenue
The employee information is for the employee that is the customer.customerRepEmployeeNumber .
revenue is the total revenue over all of the customer's orders.
The revenue for an order is priceEach*quantityOrdered for each orderdetails in the order.
Answer:
In [26]:
* mysql+pymysql://root:***@localhost
0 rows affected.
0 rows affected.
Out[26]:
[]
Test Answer:
In [27]:
%sql select * from employee_customer_sales ;
* mysql+pymysql://root:***@localhost
98 rows affected.
Out[27]:
employeeNumber
1337
1337
1337
1337
1337
1337
1501
1501
1501
S2
Question:
Below, there is a query that creates a view. Run the query.
Using the view, write a query that produces a table of the form (productCode, productName) for products that no customer in Asia has ordered. For this questions purposes, the Asian countries are:
Japan
Singapore
Philipines
Hong King
You must not use a JOIN.
In [29]:
# # Create the view # %sql create or replace view orders_all as \ select * from orders join orderdetails using(orderNumber) |
* mysql+pymysql://root:***@localhost
0 rows affected.
Out[29]:
[]
Answer:
In [63]:
%%sql |
* mysql+pymysql://root:***@localhost
S3
Question:
Use the customers and orders for this query.
Shipping days is the number of days between orderDate and shippedDate.
Product a table of the form:
customerNumber
customerName
noOfOrders is the number of orders the customer placed.
averageShippingDays , which is the average shipping days.
minimumShippingDays , which is the minimum shipping days.
maximumShippingDays , which is the maximum shipping days.
The table should only contain entries where:
localhost:8888/notebooks/Desktop/F22_W4111_003_FInal_Exam.ipynb
noOfOrders >= 3
averageShippingDays >= 5 or maximumShippingDays >= 10.
Answer:
In [64]:
2023-01-03