Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Business  Data Management Individual Project

Fall 2023


Photo by Jack Moreh from Freerange Stock

The goal of this project is for you to sharpen your SQL skills and gain experience in a realistic data analysis project.  The database in this project is about company financial information for nine companies (Amazon, Apple, Cisco, Meta, Microsoft, Netflx, Qualcomn, Starbucks, and Tesla). Since all these companies are listed on the NASDAQ, the company ticker is sufficient for identifying the company in this assignment.  The following sections describe important aspects of the project assignment.  Read all the sections before you begin.

1. Data Source and Data Dictionary

The source data and data dictionary for this project along with this document are available in Fa_2023_Project_Files.zip.  A link to this file is provided in the Canvas Assignment.  A description of the source files follows:

a) Company_src.csv which contains information for each company including the company’s ticker symbol.

b) Files named ticker_income.csv in the Income Folder where ticker is the ticker for one of the nine companies. This file contains 10 years of income statement data for the company.

c) Files named ticker_HistoricalData.csv in the TenYearHistorical Folder where ticker is the ticker for one of the nine companies. This file contains 10 years of stock price data for the company.

The data dictionary is called Fa_2023_Project_Data_Dictionary.xlsx.   A pdf copy is also provided.  The data dictionary describes each attribute for the tables above, and their provenance.  The income files are derived from files downloaded from Morningstar as described in the data dictionary.  The MorningstartSourceFiles are provided in case you need to explore the source of the data.

2. Required Target Tables

You are required to create target tables from the source tables.  The target tables must the proper types, and all primary key and foreign key constraints.  The target tables to create from the source tables are:

a) Company(Ticker, Company, Fiscal_Year_End, Morningstar_Size, Morningstart_Style, Sector, Industry)

b) Income(Ticker, Year, Revenue, Operating_Income, Net_Income, Basic_EPS, Diluted_EPS, Basic_WASO, Diluted_WASO)

c) Stock(Ticker, Date, Close, Volume, Open, High, Low)

Note that the Income and Stock Data each contain the data for all 9 companies identified by ticker.

3. Tips on Data Integration

The following are tips for doing your data integration:

a) You need to enclose each column named after a year in backquotes, for example `2015` to refer to the column in MySQL.  This distinguishes the column from an integer value.

b) If you click in your script where you want a table name to appear, and then double-click on the table name in the schema list, the table name is inserted into your script (in most versions of MySQL Workbench).

4. Important Rules

You must follow these rules, or you may lose credit for the assignment:

a) This project MUST be done individually.

b) Do NOT edit the source csv files.  You can use SQL to update the source tables once the files are loaded.

c) You are required to create the same relation and attribute names as those described above.  Changing the names makes it harder for us to grade.  

d) Do not run the queries for questions 2-9 on the source tables you first loaded.  Run your queries on the target tables.

e) Do not use the data type TEXT in your target tables.

f) Use numeric and date types when appropriate in your target tables.

g) You must submit your work in the appropriate Gradescope Assignment as described below in the Deadlines Section.

h) Submit the SQL you used to create the target tables.  Create the target tables with CREATE commands and INSERT.  DO NOT USE ALTER TABLE to change the tables in place.  You can use the Data Import Wizard to load the SRC tables.  You do not need to write SQL code to upload the source csv files.

i) Submit the SQL for each query in the box in the question for that query.  Use text, not a picture, to submit the SQL.

j) Submit a picture of the result of running the query or paste the table that results from running the query in the box after the SQL for the query.

k) Keep a script for all your work and submit it as the answer to Question 10 on the Gradescope.  

l) Your SQL answer must work for any data or changes to data in this database.  Your data answer must be in a single table you generated using SQL, not several tables that need to be combined manually to get the answer to the question.

m) You cannot use data values in a query unless they are specified in the question.  If you need another value, generate it with a query and use that query or its resulting table in the query that ultimately produces the answer.

n) Unless specified otherwise, remove duplicates from your answers as appropriate.

If you do not submit individual answers to Gradescope Questions 1-9, but only a script to Gradescope Question 10, we may not be able to grade your work before final grades are due.

5. Deadlines

As you know from the School Exercise, it is important to load the data and integrate it into the target tables as soon as possible.  To encourage you to do that, you are required to submit the project in two phases as given in Figure 1.

 

Figure 1: Project Schedule and Due Dates 

The Gradescope Submitter for Question 1 is called Individual Project Quesiton 1 (Data Integration) Submission.”  The Gradescope Submitter for Questions 2-10 is called “Individual Project Questions 2-10 Submission.”

There are significant penalties for submitting either phase of the assignment later than the due dates given above, see the syllabus for more information.

6. The Questions to Answer as They Appear on Gradescope

Answer the following questions by performing SQL operations on the database.

Provide the SQL for creating the target tables described in the assignment and loading the source data into the tables.  Be sure to include any key or foreign key constraints that you see when creating the relations.  Include any transformations you do on the data in the source tables to clean and integrate the data.  If you are changing a subset of records in the source tables, show the queries you use to determine which records to change, or you will not receive full credit.  Be sure to show how you copy the data from the source tables to the target tables described above.  Note: you need to transfer the tuples from the source tables to the target tables using SQL.  Do NOT use ALTER TABLE to transform the source tables into the target tables. I would like to see that you can use the INSERT command to accomplish this transfer. (18 points)

List the sector, industry and name for each company in ascending order by sector and then industry.  (10 points)

List the ticker, company name, date, volume, open price, close price for the company named “Amazon.com Inc” for the months of October (10) and November (11) in 2023. Format the volume with commas and no decimal point.  List the result in descending order by date.   (10 points)

List the ticker, company name, minimum net income in millions, and maximum net income in millions for each company in ascending order by company name.  The net income columns should be called “Minimum Net Income in Millions” and “Maximum Net Income in Millions” respectively. Round the net income to the nearest million and format it with commas for readability.  (10 points)

List the ticker, company name, date, Morningstar_size in a column called “Size”, Morningstar_style in a column called “Style”, closing price in a column called “Closing Price”, Basic WASO in Millions in a column called “Basic WASO in Millions”, Diluted WASO in Millions in a column called “Diluted WASO in Millions”, Market Cap in Millions in a column of that name, and Diluted Market Cap in Millions in a column of that name for trading days in October (10) and November (11) for every available year.  Format the WASO and Market Cap numbers to include commas as appropriate.  Include dollar signs in the formatting of price and market caps.  WASO numbers should be integers.  Market Cap Numbers should be rounded to two decimal places.  You can learn how to calculate these market caps at this web site:  

https://www.investopedia.com/terms/m/marketcapitalization.asp  

Order your result in ascending order by company and descending order by date.  Note that you will only be able to report results for years where the company has an income statement in the database.  (10 points)

List the ticker, name, date, closing price in a column called “Closing Price” for the company with the highest closing price and the company with the lowest closing price in 2023. Include a dollar sign in the formatting of the price. List the result in descending order by price.  (10 points)

List the ticker, name, and dates for two consecutive dates where the closing price on the second day is at least 12% less than the opening price on the first day.  Name the dates First_Date and Last_Date. Include in the result the opening price on the first date and the closing price on the second date. The percentage decline should be a negative number.  Format the percentage decline as a percent, for example -12% instead of -0.12.  Round the percent to the nearest integer.  Tip:  See https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_date-add  

Note:  “Consecutive dates” means the second date is one day after the first date.  This query does not need to consider declines across weekends or holidays.  (10 points)

Use some or all of your query from Question 7 to list the companies that did not decline. (10 points)

List the  Morninstar_size,  Morningstar_style, ticker, company, fiscal_year_end, year, net income in millions in a column called “Net Income in Millions”, and a Comment column for the company or companies with the highest net income and the company or companies with the lowest net income in each combination of Morningstar_size and Morningstar_style in the year 2022.  Round the net income to the nearest integer and include a dollar sign and any appropriate commas.  The Comment column should say “Highest Net Income” or “Lowest Net Income” as appropriate. (10 points)

Submit your SQL script file for all the SQL you executed for this project. (2 points)