EE104 Module 6 Project Ideas

Version 04/06/2021

(This is a working file, i.e. more projects will be added frequently. Check out often for the latest version)

If you don’t have a project idea yet, maybe one of the ideas below will spark your interest.


Table of Contents

EE104 Module 6 Project Ideas..........................................................................................................................................1

Table of Contents............................................................................................................................................................1

1 - Trend Prediction.....................................................................................................................................................1

2 - Data Science...........................................................................................................................................................1

3 - Data Science – Data Relationship..........................................................................................................................1

4 - Data Science Statistics...........................................................................................................................................1


Do Problem #1 (60%) and any one of the problem #2 or #3 or #4 (40%). No extra credit.


1 - Trend Prediction

Search for COVID-19 data in CSV file format or any format that can be converted to CSV.

Select 3-month worth of data (i.e. March to May). Plot a selected data criterion for that 3 months and predict the trend for the next 3 months into the future (i.e. June to August). Compare your prediction with reality. Stating whether your prediction is correct or not. Research and present the reasons for your data match/mismatch.

Submit source(s), code, and output screenshots to Canvas.


2 - Data Science

Download a CSV file from this website: http://www.creditriskanalytics.net/datasets-private2.html

Write a Python program to analyze the risk factors that cause the loan defaults and provide a report to the bank for 3 groups: low risk, medium risk, and high risk.


3 - Data Science – Data Relationship

Using data that you can find from the website below, write a Python program to find the relationships among 3 different data criteria (that you will select) using Pearson Correlation Coefficient and ChiSquare Test of Independence. Show plot(s) to prove your findings.

https://guides.lib.berkeley.edu/publichealth/healthstatistics/rawdata


4 - Data Science Statistics

Using the Salaries.csv file from the link below for the San Francisco area, clean the data as needed, and calculate the values of +/- 1 sigma, +/-2 sigma, and +/-3 sigma. Plot the data to prove your findings.

https://www.kaggle.com/kaggle/sf-salaries?select=Salaries.csv