PPOL 565 Project Guidelines
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Project Guidelines
Overview
The final project will allow you to delve deeper into a topic of interest. Option 1: a research design in which you describe a research question or identify a critique and work through various issues you would need to confront in answering the question or resolving the cri- tique using the techniques that we will cover in class. Option 2: find an existing analysis and replicate it, describing what the author did and exploring alternative approaches and diagnostics. You may use Python or R. The project will comprise 35% of your course grade.
Requirement |
Due |
Points |
Memo 1: Topic and Data Memo 2: Analysis Plan |
March 3 April 5 |
15 20 |
Presentation Slides |
April 21 |
20 |
Reflection |
April 28 |
10 |
White Paper & Replication Code |
May 10 |
75 |
Project Total |
|
140 |
Project Memos
You will submit two memos to Professor Brodnax detailing the progress on the project.
Memo 1: Topic & Data - Due March 3
This memo to Professor Brodnax will detail your progress on the final project. Specifically, it will be a first draft of the Background and Data sections of the white paper.
• Background Provide an overview of your research question and a brief literature review of the topic. Discuss the current methodological approaches as well as reasons why these may not be optimal. Potential areas to critique include: research questions, datasets, units of analysis, techniques, and interpretation of findings. Discuss the broader context: what social, political, and/or financial factors make this an important question?
• Data Discuss the types of data that would be ideal for analyzing this question. Identify at least one dataset that could be used to answer your research question. There are many sources of data, including Data.gov, ICPSR, Kaggle, and the US Census Bureau. You can also look for data on city- or state-level open data portals.
Note: If you are replicating an existing study, you do not need to use the same dataset from the study.
Discuss the limitations of the dataset(s) for answering your research question.
– The dataset(s) must be publicly accessible.
– Your bibliography must include a dataset citation with a web address.
– Each dataset must have at least 500 observations, not including time series. For example, suppose you find a survey dataset where the unit of analysis is household. There must be at least 500 individual households represented in the survey. A survey with 150 households that were surveyed 5 times would not be acceptable, even though the total number of observations for all time series exceeds 500.
Your memo should be approximately 600 to 800 words and be in professional memo format. Your memo should be clearly written, with proper spelling and grammar. You must also include in-text citations and a bibliography formatted in APA style. The bibliography will not be included in the 600- to 800-word guideline.
Memo 2: Analysis Plan - Due April 5
This second memo to Professor Brodnax will detail your progress on the final project. Specif- ically, it will be a second draft of the Data section and a first draft of the Methodology section of the white paper. Be sure to incorporate any feedback from the first memo.
• Background At this point your research question should be finalized. Provide an update to the Background section if needed.
• Data At this point your data selection should be finalized. Discuss your preliminary review of the data. When selecting what variables to include in your memo, consider only those needed to answer the question or guide interpretation of the results.
– Create numerical summaries of relevant variables (minimum of two). If your dataset contains a large number of variables, select a subset of variables that you think are most relevant to your research question. If the variables are continuous, include a table of descriptive statistics and discuss how the data are distributed. If the variables are categorical, include frequency tabulations and discuss patterns across observations.
– Create graphical summaries of relevant variables or relationships between vari- ables (minimum of two). Provide an interpretation of each plot. What key infor- mation or limitation does it reveal about the data with respect to your research question?
– Your bibliography must include a dataset citation with a web address.
– Each dataset must have at least 500 observations, not including time series.
• Methodology Discuss your analytical plan.
– Indicate which variables will comprise your target and features.
– You must select at least one parametric and at least one non-parametric technique. What techniques do you plan to use? Give a justification for your choice based on the research question (prediction vs inference) and the outcome of interest (numerical or categorical).
– Provide a brief explanation of the intuition behind each model’s algorithm. Pro- vide a brief literature review of each technique. The literature review should include at least one example of how each technique was applied in a different setting (journal article, news article, blog, video, etc.).
Your memo should be approximately 1,200 to 1,500 words (including background material from Project Memo 1) and be in professional memo format. Your memo should be clearly written, with proper spelling and grammar. You must also include in-text citations and a bibliography formatted in APA style. The bibliography will not be included in the 1,200- to 1,500-word guideline.
Expectations
• Numerical values must be neatly formatted in tables (no screenshots). Numbers should have no more than three digits after the decimal place.
• Plots must include titles, axis labels, and element labels. Take care that axis values are neatly formatted and easy to interpret. Plots should be easy to understand without outside information.
• Labels and descriptions within tables or plots should be brief and meaningful with proper spacing and capitalization (no variable names with underscores).
Presentation Slides - Due April 21
During the final week of classes, the class will give project presentations in the format of a data science conference. The presentations will be organized into panels by topic, with each panel running approximately 30 minutes. Panels may be shorter or longer depending on the number of students presenting on that topic.
Each student will give a 4- to 5-minute presentation providing a concise summary of their project, including the following:
1. Research question and background
2. Data sources, target, and important features
3. Parametric and non-parametric techniques used
4. Evaluation and interpretation
5. Conclusion and limitations
Reflection - Due April 28
Following the presentations, you will write a brief reflection based on your attendance to at least three panels, including your own. You may earn extra credit of up to 5 points by attending more than three panels and discussing them in your reflection.
White Paper & Replication Code - Due May 10
The final project must be uploaded to Canvas and must include (1) the white paper in PDF or Microsoft Word format; (2) all code used to generate the analyses, tables, and plots; and (3) all datasets as imported into Python or R scripts/ notebooks. Note: The white paper is not a memo and should not be submitted in memo format. The white paper must include the following:
• Project Title and Student Name
• Executive Summary
• Introduction
– Provide an overview of your topic.
– If your project is a replication, discuss the authors’ methodological approach and discuss reasons why these may not be optimal. Potential areas to critique include: research questions, datasets, units of analysis, techniques, and interpretation of findings. Explain how your approach differs from current approaches.
• Data
– Discuss the types of data that would be ideal for analyzing this project.
– Provide a review of the datasets used for the analysis, including relevant numerical and graphical summaries.
– Discuss the limitations of the datasets.
• Methodology
– Discuss the intuition behind the technique(s) you are using. You must utilize at least one parametric technique and at least one non-parametric technique.
– Provide a literature review with examples of how these techniques were applied in a different setting (journal article, news article, blog, video, etc.).
– Discuss the strengths and weaknesses of each technique.
• Findings
– Discuss the details of your analysis, including estimation and evaluation.
– Depending on your model, evaluation discussion can include hyper-parameter selection, training size selection, and performance (accuracy, cross-validation re- sults, confusion matrices, ROC/AUC, etc.)
– Provide an interpretation of the results of your analysis. What insights did you generate from the analysis?
• Conclusion
– Discuss the implications of your insights. How might these insights be applied in your topic domain?
– Discuss limitations (ethical, computational, data, etc.) and future considerations.
• Bibliography
– The bibliography must be in APA format.
• Implementation Appendix
– Use this section to discuss any relevant, interesting, or innovative aspects of your technical implementation.
– For example, did you scrape the web or use an API? Develop any new measures? Reshape and/or pre-process the data?
Rubric
Area Expectations
Submission |
All components were submitted on time: (1) report (PDF/Word), (2) data, and (3) replication code. Final dataset includes at least 500 obser- vations. |
Coverage |
Report content comprises all components of the report outline above, in- cluding: an introduction of the problem and contextual background; de- scription of the datasets as well as numerical and visual summaries; expla- nation of at least two analytical techniques, including a review of examples from outside of class; interpretation of the analysis and evaluation; and discussion of implications and limitations. Report sections are clearly in- dicated via formatting. |
Data |
Each dataset includes at least 500 observations. Summaries are provided for all relevant variables and/or relationships (minimum of two). Only those summaries needed to answer the question or guide interpretation are included. Numerical values are neatly formatted in tables. The numbers displayed in tables have no more than three digits after the decimal place. Interpretations are provided for all tables |
Graphics |
Plots are provided for all relevant variables and/or relationships (mini- mum of two). Only those plots needed to answer the question or guide interpretation are included. The plots include titles, axis labels, and el- ement labels when needed. The labels and table headings are brief and meaningful. The plots are easy to understand without outside informa- tion. Interpretations are provided for all plots. |
Techniques |
Analysis includes at least two techniques covered in PPOL 565; at least one must be parametric and at least one must be non-parametric. Con- tent includes an explanation of each technique written for a non-technical audience. The discussion demonstrates understanding of technique and does not regurgitate documentation. Examples of technique applications do not use data or examples discussed in class. |
Interpretation |
The interpretation discusses key insights derived from the analysis, as well as an evaluation of techniques. |
2023-02-28