CS544 Final Project
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
CS544 Final Project
Picking the Data Set
Look into the following sites as an example and select a data set that interests you.
1. https://www.kaggle.com/datasets
2. https://github.com/fivethirtyeight/data
3. http://www.kdnuggets.com/datasets/index.html
4. Any other source ofyour choice
Preparing the data
• Import the data set into R.
• Document the steps for the import process and any preprocessing had to be done prior to or after the import. Any R code used in the process should be included.
Analyzing the data
• Do the analysis as in Module3 for at least one categorical variable and at least one numerical variable. Show appropriate plots for your data.
• Do the analysis as in Module3 for at least one set oftwo or more variables. Show appropriate plots for your data.
• Pick one variable with numerical data and examine the distribution ofthe data.
• Draw various random samples ofthe data and show the applicability ofthe Central Limit Theorem for this variable.
• Show how various sampling methods can be used on your data. What are your conclusions ifthese samples are used instead ofthe whole dataset.
• Implementation of any feature(s) not mentioned in the above specification.
Presenting the Project
• You will do your project presentation with the Facilitator using Zoom.
• Each presentation is for at most 10 minutes. Signup sheet will be provided later.
Grading Rubric:
• Preparing the Data and documenting the data preparation (15 points)
• Analyzing the Data and documenting the same (50 points)
• Implementation of any feature(s) not mentioned in the specification (10 points)
• Presenting the project in the Live Classroom with Facilitator (25 points)
Submitting the Project
Upload a zip file (CS544Final_lastName.zip) containing all the code as RMarkdown
(Rmd file), the presentation document (PDF or PPT, if any), and all the results in a RMarkdown HTML.
2022-02-22