Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Programming R for Analytics 94-842

Final Project Description / Rubric

The final project is a chance for you to use and demonstrate the R / RStudio and R Markdown skills you have learned in class throughout the mini. You will NOT need to use every tool and technique we will have covered, but you must demonstrate some of the skills.

We will be discussing the project requirements many times before it is due. As noted in the syllabus, the work you do MUST be your own. You may discuss your work and dataset with other students, but the code, results, analysis and story must be yours and yours alone. The code used to generate your results should be clearly included in the final, finished document.

The source where your data came from must be clearly referenced. If the dataset is not publicly available online, then the full dataset must be submitted with the final work for verification (unless it is proprietary, then we can have a discussion about the suitability of the dataset).

If you find any code snips that help with your analysis, they too should be referenced. Something simple with a comment line is fine like: “I found and modified the function used in this chunk to sort and color code the barplot on Stack Overflow” – see notes in syllabus about plagiarism.  While looking for and working on a dataset, think about telling a good story and not only about the size of the dataset or the number of lines of code.

55 Total points available:

10 points for correct type of dataset

Recommended minimum size: 5000 rows and 10 variables

Smaller size datasets may be used with prior approval*

Should be related to public policy issues/concerns (see note below)

15 points for correct use of statistics in analysis

Examples include use of means, standard deviation/variance, t-tests, ANOVA, others

15 points for strong, clear graphics (minimum of 3 graphs)

Examples include histograms, barplots, boxplots, x-y scatter plots, others

Graphs should be well labeled and informative

15 points for strong story-telling and use of R Markdown for annotating and reporting

The entire project must be well detailed. Explanation of the background of the project (Why were you interested in this particular dataset? What does the dataset cover – time, variables, data types? What are your thoughts or hypotheses about the dataset before the analysis – what are you trying to show, prove or disprove? What are the results and what supports those results? What are your final conclusions or recommendations?)

The final project should be written in R, using RStudio and R Markdown and saved in a final, knit file that shows appropriate header information (name, title, date) in a suitable file format (html preferred, but other formats are acceptable such as pdf or slides). The final knit file should be saved with your andrew id in the file name.

You are responsible for finding your own dataset. The data should be in an area around public policy – preferably something that you are personally interested in. It could be race or gender related, voting/politics, health-care/medicine, financial/personal earnings/salaries and benefits, etc.  Other types of datasets might work, such as sports, science/technology, social media with my prior approval*.

Examples of places to look for data:

CMU libraries: https://guides.library.cmu.edu/az.php 

https://catalog.data.gov/dataset 

https://www.icpsr.umich.edu/web/pages/

https://archive.ics.uci.edu/ml/index.php 

https://www.pewresearch.org/download-datasets/

https://www.google.com/publicdata/directory

https://www.pewresearch.org/download-datasets/ 

*prior approval means you request a waiver from me by email or through Canvas at least one (1) week before the final project is due - any datasets not meeting the recommended guidelines will have points deducted.