COMS W4111: Introduction to Databases Spring 2023, Sections 002
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
COMS W4111: Introduction to Databases
Spring 2023, Sections 002
Take Home Midterm
Overview
Instructions
Due Date: Sunday, 2023-MAR-12 at 11:59pm
You may not use late days.
You have one week to complete the take home portion of the midterm. All of the work must be your own, you may not work in groups or teams. You may use outside sources so long as you cite them and provide links.
Points will be taken off for any answers that are extremely verbose. Try to stay between 2-3 sentences for definitions and 5 sentences for longer questions. You may post privately on Ed or attend OH for clairification questions. TAs will not be providing hints.
There is a post on Ed (https://edstem.org/us/courses/32981/discussion/2716284) that:
Provides submission instructions.
Clarifications and corrections on questions.
Students are responsible for reading and monitoring the post.
Environment Setup
Note: You will need to change the MySQL userID and password in some of the cells below to match your configuration.
Environments
Different applications and different scenarios use different ways to interaction with databases.
We use three different connection/interaction models to give students experience with the various options.
ipython-SQL
In [1]:
%load_ext sql
In [2]:
# # Set the userid and password for connecting to your instance of SQL. # mysql_user = "root" mysql_password = "dbuserdbuser" mysql_url = f"mysql+pymysql://{mysql_user}: {mysql_password}@localhost" print("Your connection URL is", mysql_url) |
Your connection URL is mysql+pymysql://root:dbuserdbuser@localhost
In [18]:
#
# Connect. See the ipython-sql documentation for the $variable syntax.
#
%sql $mysql_url
SQL Alchemy and Pandas
In [4]:
# # Yes, I know the cool kids import as pd. I am not cool. # import pandas |
In [5]:
# # Pandas SQL operations require a SQL Alchemy engine. # from sqlalchemy import create_engine |
In [19]:
%sql use classicmodels ;
pymysql
In [7]:
%sql show tables ;
In [20]:
pymysql_con = pymysql.connect( user= mysql_user, password= mysql_password, host= "localhost", port= 3306, autocommit= True, cursorclass= pymysql.cursors.DictCursor) |
Data Loading
Classic Models
We will use the Classic Models (https://www.mysqltutorial.org/mysql-sample-database.aspx) sample database for many of the questions on this exam. The directory containing this notebook contains a file classic-models-sample.sql .
Load the data:
Open the file in DataGrip using File -> Open dialog.
Select all of the text/SQL in the file.
Click the green arrowhead to run the files contents.
Running the following queries will test if the load worked.
In [10]:
%sql use classicmodels ;
* mysql+pymysql://root:***@localhost
0 rows affected.
Out[10]:
[]
In [11]:
%sql show tables ;
* mysql+pymysql://root:***@localhost
8 rows affected.
Out[11]:
Tables_in_classicmodels
customers
employees
offices
orderdetails
orders
payments
productlines
products
In [12]:
%sql select count(*) as count from orders join orderdetails using(orderNumber)
* mysql+pymysql://root:***@localhost
1 rows affected.
Out[12]:
count
2996
Lahman's Baseball Database
You previously loaded information from Lahman's Baseball Database. (https://www.seanlahman.com/)
If you have not done so, the following code will load the data into a new schema lahmansdb_midterm .
In [22]:
%sql create schema lahmansdb_midterm
* mysql+pymysql://root:***@localhost
1 rows affected.
Out[22]:
[]
In [25]:
people_df = pandas.read_csv("./People.csv") people_df.to_sql ("people", schema="lahmansdb_midterm", con=sql_engine,index=False, if_exists="replace") |
Out[25]:
20370
In [26]:
batting_df = pandas.read_csv("./Batting.csv")
batting_df.to_sql ("batting", schema="lahmansdb_midterm", con=sql_engine,index=False, if_exists="replace")
Out[26]:
110495
In [32]:
pitching_df = pandas.read_csv("./Pitching.csv")
pitching_df.to_sql ("pitching", schema="lahmansdb_midterm", con=sql_engine,index=False, if_exists="replace")
Out[32]:
49430
This will test the data loading.
In [29]:
%sql select count(*) as people_count from lahmansdb_midterm.people ;
* mysql+pymysql://root:***@localhost
1 rows affected.
Out[29]:
people_count
20370
In [30]:
%sql select count(*) as batting_count from lahmansdb_midterm.batting ;
* mysql+pymysql://root:***@localhost
1 rows affected.
Out[30]:
batting_count
110495
In [33]:
%sql select count(*) as pitching_count from lahmansdb_midterm.pitching ;
* mysql+pymysql://root:***@localhost
1 rows affected.
Out[33]:
pitching_count
49430
Written Questions
W1
Question
Define the concept of immutable column and key.
Why do some sources recommend that a primary key should be immutable?
How would to implement immutability for a primary key in a table?
Answer
Type Markdown and LaTeX: a2
W2
Question
Views are a powerful concept in relational database management systems. List and briefly explain 3 benefits of/reasons for creating a view.
Answer
Type Markdown and LaTeX: a2
W3
Question
Briefly explain the concepts of procedural language and declarative language. SQL is primarily a declarative language. SQL added procedure language capabilities in functions, procedures and triggers. What is a reason for this addition?
Answer
In [ ]:
W4
Question
The following diagram is a simple representation of the architecture of a Jupyter notebook using MySQL. Is this a two-tier architecture or a three-tier architecture? Explain your answer briefly.
Answer
Type Markdown and LaTeX: a2
W5
Question
Consider a US Social Security Number. An example is "012-34-6789".
The data type is character string.
The relational model requires that columns (attributes) are from a domain.
Use the Social Security Number example to explain the difference between a type and a domain.
Answer
Type Markdown and LaTeX: a2
W6
Question
Briefly explain the differences between:
Database stored procedure
Database function
Database trigger
Answer
Type Markdown and LaTeX: a2
W7
Question
Briefly explain:
Natural join
Equi-join
Theta join
Self-join
Answer
Type Markdown and LaTeX: a2
W8
Question
Briefly explain the difference between a unique (key) constraint and a primary key constraint?
Answer
Type Markdown and LaTeX: a2
W9
Question
Give two reasons for using an associative entity to implement a relationship instead of using a foreign key.
Answer
Type Markdown and LaTeX: a2
W10
Question
Briefly explain the concepts of:
Conceptual model
Logical model
Physical model
For data modeling.
Answer
Type Markdown and LaTeX: a2
W11
Question
Briefly explain the concepts of:
Data manipulation language
Data definition language
Given an example statement in SQL for DML and for DDL.
Answer
Type Markdown and LaTeX: a2
W12
Question
Codd's 4th rule is:
Rule 4 - Dynamic online catalog based on the relational model:
The data base description is represented at the logical level in the same way as ordinary data, so that authorized users can apply the same relational language
to its interrogation as they apply to the regular data.
Explain what this means, and use SQL to provide examples.
Answer
Type Markdown and LaTeX: a2
W13
Question
The formal definition of a theta join is
Briefly explain the definition and give an example.
T ⋈e S = Ge(T × S)
Why is the fact that the relational algebra is closed is important to this definition?
Answer
Type Markdown and LaTeX: a2
W14
Question
Consider two different statements in the relational algebra or SQL.
Despite being different statements, the statements may be equivalent. Briefly explain what this means.
Answer
Type Markdown and LaTeX: a2
W15
Question
Consider the following relation definitions.
Customers(ID, last_name, first_name)
Accounts(ID, balance, customer_ID)
What is problem with using natural join on the two tables?
Answer
Type Markdown and LaTeX: a2
Entity Relationship Modeling
ER-1
Question
This question tests your ability to "bottom up" model or "reverse engineering" a SQL schema to produce an explanatory ER-diagram. Use Lucidchart to draw a Crow's Foot notation diagram representing the following SQL.
You can use the simple table names, e.g. students instead of s23_w4111_midterm.students .
drop schema if exists s23_midterm;
create schema s23_midterm;
use s23_midterm;
drop table if exists departments;
create table if not exists departments
(
dept_code varchar(4) not null
primary key,
dept_name varchar(128) not null
);
drop table if exists instructors;
create table if not exists instructors
(
UNI varchar(12) not null
primary key,
last_name varchar(128) not null,
first_name varchar(128) not null,
dept_code varchar(4) null,
constraint instructor_dept
foreign key (dept_code) references departments (dept_code)
);
drop table if exists students;
create table if not exists students
(
UNI varchar(12) not null
primary key,
last_name varchar(128) null,
first_name varchar(128) null
);
drop table if exists students_advisors;
create table if not exists students_advisors
(
student_uni varchar(12) not null,
instructor_uni varchar(12) not null,
advising_start_date date not null,
advising_end_date date null,
primary key (student_uni, instructor_uni, advising_start_date),
constraint student_advisor_instructor
foreign key (instructor_uni) references instructors (UNI),
constraint student_advisors_student
foreign key (student_uni) references students (UNI)
);
Answer
Put your screen capture in the same directory as the midterm.
Add in the Markdown cell, using the actual file name.
ER-2
Question
This question tests your ability to convert a human language description of a data model into a Crow's Foot ER-Diagram.
Consider the data model for Classic Models that you loaded.
orders has a column comments.
In [15]:
%sql select * from classicmodels.orders limit 10 ;
* mysql+pymysql://root:***@localhost
10 rows affected.
Out[15]:
orderNumber |
orderDate |
requiredDate |
shippedDate |
2023-03-07