Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

COMS W4111: Introduction to Databases

Spring 2023, Sections 002

Take Home Midterm

Overview

Instructions

Due Date: Sunday, 2023-MAR-12 at 11:59pm

You may not use late days.

You have one week to complete the take home portion of the midterm. All of the work must be your own, you may not work in groups or teams. You may use outside sources so long as you cite them and provide links.

Points will be taken off for any answers that are extremely verbose. Try to stay between 2-3 sentences for definitions and 5 sentences for longer questions. You may post privately on Ed or attend OH for clairification questions. TAs will not be providing hints.

There is a post on Ed (https://edstem.org/us/courses/32981/discussion/2716284) that:

Provides submission instructions.

Clarifications and corrections on questions.

Students are responsible for reading and monitoring the post.

Environment Setup

Note: You will need to change the MySQL userID and password in some of the cells below to match your configuration.

Environments

Different applications and different scenarios use different ways to interaction with databases.

We use three different connection/interaction models to give students experience with the various options.

ipython-SQL

In  [1]:


%load_ext sql


In  [2]:

#

# Set the userid and password for connecting to your instance of SQL.

#

mysql_user = "root"

mysql_password = "dbuserdbuser"

mysql_url = f"mysql+pymysql://{mysql_user}: {mysql_password}@localhost"

print("Your connection URL is", mysql_url)

Your connection URL is mysql+pymysql://root:dbuserdbuser@localhost

In  [18]:


#

# Connect. See the ipython-sql documentation for the $variable syntax.

#

%sql $mysql_url


SQL Alchemy and Pandas

In  [4]:

#

# Yes, I know the cool kids import as pd. I am not cool.

#

import pandas

In  [5]:

#

# Pandas SQL operations require a SQL Alchemy engine.

#

from sqlalchemy import create_engine

In  [19]:


%sql use classicmodels ;


pymysql

In  [7]:


%sql show tables ;


In  [20]:

pymysql_con = pymysql.connect(

user= mysql_user,

password= mysql_password,

host= "localhost",

port= 3306,

autocommit= True,

cursorclass= pymysql.cursors.DictCursor)

Data Loading

Classic Models

We will use the Classic Models (https://www.mysqltutorial.org/mysql-sample-database.aspx) sample database for many of the questions on this exam. The directory containing this notebook contains a file  classic-models-sample.sql .

Load the data:

Open the file in DataGrip using  File -> Open dialog.

Select all of the text/SQL in the file.

Click the green arrowhead to run the files contents.

Running the following queries will test if the load worked.

In  [10]:


%sql use classicmodels ;


* mysql+pymysql://root:***@localhost

0 rows affected.

Out[10]:

[]

In  [11]:


%sql show tables ;


* mysql+pymysql://root:***@localhost

8 rows affected.

Out[11]:

Tables_in_classicmodels

customers

employees

offices

orderdetails

orders

payments

productlines

products

In  [12]:


%sql select count(*) as count from orders join orderdetails using(orderNumber)


* mysql+pymysql://root:***@localhost

1 rows affected.

Out[12]:

count

2996

Lahman's Baseball Database

You previously loaded information from Lahman's Baseball Database. (https://www.seanlahman.com/)

If you have not done so, the following code will load the data into a new schema  lahmansdb_midterm .

In  [22]:


%sql create schema lahmansdb_midterm


* mysql+pymysql://root:***@localhost

1 rows affected.

Out[22]:

[]

In  [25]:

people_df = pandas.read_csv("./People.csv")

people_df.to_sql ("people", schema="lahmansdb_midterm", con=sql_engine,index=False, if_exists="replace")

Out[25]:

20370

In  [26]:


batting_df = pandas.read_csv("./Batting.csv")

batting_df.to_sql ("batting", schema="lahmansdb_midterm", con=sql_engine,index=False, if_exists="replace")


Out[26]:

110495

In  [32]:


pitching_df = pandas.read_csv("./Pitching.csv")

pitching_df.to_sql ("pitching", schema="lahmansdb_midterm", con=sql_engine,index=False, if_exists="replace")


Out[32]:

49430

This will test the data loading.

In  [29]:


%sql select count(*) as people_count from lahmansdb_midterm.people ;


* mysql+pymysql://root:***@localhost

1 rows affected.

Out[29]:

people_count

20370

In  [30]:


%sql select count(*) as batting_count from lahmansdb_midterm.batting ;


* mysql+pymysql://root:***@localhost

1 rows affected.

Out[30]:

batting_count

110495

In  [33]:


%sql select count(*) as pitching_count from lahmansdb_midterm.pitching ;


* mysql+pymysql://root:***@localhost

1 rows affected.

Out[33]:

pitching_count

49430

Written Questions

W1

Question

Define the concept of immutable column and key.

Why do some sources recommend that a primary key should be immutable?

How would to implement immutability for a primary key in a table?

Answer

Type Markdown and LaTeX: a2

W2

Question

Views are a powerful concept in relational database management systems. List and briefly explain 3 benefits of/reasons for creating a view.

Answer

Type Markdown and LaTeX: a2

W3

Question

Briefly explain the concepts of procedural language and declarative language. SQL is primarily a declarative language. SQL added procedure language capabilities in functions, procedures and triggers. What is a reason for this addition?

Answer

In  [  ]:

W4

Question

The following diagram is a simple representation of the architecture of a Jupyter notebook using MySQL. Is this a two-tier architecture or a three-tier architecture? Explain your answer briefly.

Answer

Type Markdown and LaTeX: a2

W5

Question

Consider a US Social Security Number. An example is "012-34-6789".

The data type is character string.

The relational model requires that columns (attributes) are from a domain.

Use the Social Security Number example to explain the difference between a type and a domain.

Answer

Type Markdown and LaTeX: a2

W6

Question

Briefly explain the differences between:

Database stored procedure

Database function

Database trigger

Answer

Type Markdown and LaTeX: a2

W7

Question

Briefly explain:

Natural join

Equi-join

Theta join

Self-join

Answer

Type Markdown and LaTeX: a2

W8

Question

Briefly explain the difference between a unique (key) constraint and a primary key constraint?

Answer

Type Markdown and LaTeX: a2

W9

Question

Give two reasons for using an associative entity to implement a relationship instead of using a foreign key.

Answer

Type Markdown and LaTeX: a2

W10

Question

Briefly explain the concepts of:

Conceptual model

Logical model

Physical model

For data modeling.

Answer

Type Markdown and LaTeX: a2

W11

Question

Briefly explain the concepts of:

Data manipulation language

Data definition language

Given an example statement in SQL for DML and for DDL.

Answer

Type Markdown and LaTeX: a2

W12

Question

Codd's 4th rule is:

Rule 4 - Dynamic online catalog based on the relational model:

The data base description is represented at the logical level in the same way as ordinary data, so that authorized users can apply the same relational language

to its interrogation as they apply to the regular data.

Explain what this means, and use SQL to provide examples.

Answer

Type Markdown and LaTeX: a2

W13

Question

The formal definition of a theta join is

Briefly explain the definition and give an example.


T  ⋈e  S = Ge(T × S)

Why is the fact that the relational algebra is closed is important to this definition?

Answer

Type Markdown and LaTeX: a2

W14

Question

Consider two different statements in the relational algebra or SQL.

Despite being different statements, the statements may be equivalent. Briefly explain what this means.

Answer

Type Markdown and LaTeX: a2

W15

Question

Consider the following relation definitions.

Customers(ID, last_name, first_name)

Accounts(ID, balance, customer_ID)

What is problem with using natural join on the two tables?

Answer

Type Markdown and LaTeX: a2

Entity Relationship Modeling

ER-1

Question

This question tests your ability to "bottom up" model or "reverse engineering" a SQL schema to produce an explanatory ER-diagram. Use Lucidchart to draw a Crow's Foot notation diagram representing the following SQL.

You can use the simple table names, e.g.  students instead of  s23_w4111_midterm.students .

drop schema if exists s23_midterm;

create schema s23_midterm;

use s23_midterm;

drop table if exists departments;

create table if not exists departments

(

dept_code varchar(4)   not null

primary key,

dept_name varchar(128) not null

);

drop table if exists instructors;

create table if not exists instructors

(

UNI        varchar(12)  not null

primary key,

last_name  varchar(128) not null,

first_name varchar(128) not null,

dept_code  varchar(4)   null,

constraint instructor_dept

foreign key (dept_code) references departments (dept_code)

);

drop table if exists students;

create table if not exists students

(

UNI        varchar(12)  not null

primary key,

last_name  varchar(128) null,

first_name varchar(128) null

);

drop table if exists students_advisors;

create table if not exists students_advisors

(

student_uni         varchar(12) not null,

instructor_uni      varchar(12) not null,

advising_start_date date        not null,

advising_end_date   date        null,

primary key (student_uni, instructor_uni, advising_start_date),

constraint student_advisor_instructor

foreign key (instructor_uni) references instructors (UNI),

constraint student_advisors_student

foreign key (student_uni) references students (UNI)

);

Answer

Put your screen capture in the same directory as the midterm.

Add in the Markdown cell, using the actual file name.

ER-2

Question

This question tests your ability to convert a human language description of a data model into a Crow's Foot ER-Diagram.

Consider the data model for Classic Models that you loaded.

orders has a column  comments.

In  [15]:


%sql select * from classicmodels.orders limit 10 ;


* mysql+pymysql://root:***@localhost

10 rows affected.

Out[15]:

orderNumber

orderDate

requiredDate

shippedDate