Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

DATA 110

Data Project 2

For this assignment, you will incorporate the techniques we have learned so far in class. Your project 2 during week 11 may be extended to use for your final project, if you choose to do so.

You will select a dataset. You might be allowed to use a dataset from the course google drive of datasets, as long as I have not already used that dataset in notes or homework.

You must get my approval for your dataset via email to be sure it is appropriate for this assignment, and if you need help finding datasets, I can help you with this individually. You may not re-use the dataset from Project 1. You will submit your dataset choice in a separate assignment dropbox by Friday, April 14th.

Steps

1. Find a topic with a dataset; be sure to get my approval on it prior to working on your project, and be sure to indicate the source for where you get the dataset. If you get the dataset from the course google drive, be sure to ask me for the original data source! Submit your topic in the assignment dropbox by Friday, April 14th .

2. Introduce your topic and dataset in a paragraph or two at the beginning of your markdown document. Be sure you describe any variables included, what kind of variables they are, where the data came from and how you cleaned it up (be detailed and specific, using proper terminology where appropriate). Be sure to explain why you chose this topic and dataset – what meaning does it have for you?

3. Clean and explore the data variables and keep track of all of your cleaning and explorations in an R-Markdown document. Be sure to comment your actions in each chunk.

4. Your R code must have at least one command from dplyr to select, filter, summarize, mutate, group_by, arrange, etc.

5. Perform at least one statistical analysis from class notes (outlier analysis, histogram, boxplots, correlation assessment, or linear or multiple regression analysis). Comment on the results of your statistical analysis.

6. Explore both quantitative and categorical variables with simple plots to determine what you want to focus on for your final visualization.

7. Plot one or more various visualizations we have discussed throughout the course, which may or may not include GIS information. During your exploration, keep a running commentary in the Markdown text area of what you are doing and why you

are doing it. New elements that were not required for project 1 are highlighted in red font below. Your visualization must have:

a. least on dplyr command

b. least 1 statistical component that we have used in the class so far (outlier analysis, histogram, correlation assessment, boxplots, or linear or multiple regression analysis)

c. meaningful labels for axes

d. detailed title

e. some sort of legend to make sense of colors, shapes, and sizes that describe any variables.

f. non-default ggplot theme to any other theme of your choice

g. interactivity (highcharter, plotly, etc.)

Write a brief essay, which answers the following three questions. Intersperse the essay content in appropriate places throughout your markdown file.

a. The topic of the data, any variables included, what kind of variables they are, where the data came from and how you cleaned it up (be detailed and specific, using proper terminology where appropriate). Be sure to explain why you chose this topic and dataset what meaning does it have for you? This part of the essay must go at the beginning, before you load data and libraries.

b. Incorporate brief background research about this topic. This background information will include information you find in an article, website, or book. Please source this background information within the essay or if you have multiple sources, include a bibliography. I am not particular about the format of this bibliography. If you need help finding articles, I am happy to help    you and/or show you how to search the MC Library Database.

c. What the visualization represents, any interesting patterns or surprises that arise within the visualization, and anything that could have been shown that you could not get to work or that you wished you could have included.

Knit your Markdown and publish it in rpubs. Submit the link in the Assignment Dropbox by 11:59 pm on Tuesday, April 18th. Be prepared to present briefly during the following class (2-3 minutes).

Submit a maximum 3-minute video presentation on the discussion thread. Rubric for Evaluation of Project 2

Evaluation

Criterion

Points

Allotted

Sophisticated

1. Provide documentation on your data cleaning.

100%

 

2. Incorporate at least one dplyr command.

3. Provide at least one statistical component with description.

4. Provide documentation about your            exploration of variables along with simple plots to determine what you want to focus on for your final visualization.

5. Create a ggplot visualization with proper    labels, title, legend and non-default theme. You should have more than one color in     your graph.

6. Incorporate interactivity (plotly, highcharter, or Tableau)

7. Fully answer the three questions in the        essay, including background information on your topic.

8. Edit your essay for typos/grammar/sentence structure.

9. Submit final version in Rpbus from knitted   Markdown. 10. Submit your project on time.

 

Acceptable

One or two of the 9 steps above are omitted; or two or three of the 9 steps are underdeveloped.

80%

Developing  Competence

 

Three of the 9 steps are omitted; or four or 5 of the 9 steps are underdeveloped.

 

60%

Inadequate

The project has at least one serious weakness. Less than half of the requirements have been completed.

40% or

lower