Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Project Guidelines

Overview

The final project will allow you to delve deeper into a topic of interest. Option 1: a research design in which you describe a research question or identify a critique and work through various issues you would need to confront in answering the question or resolving the cri- tique using the techniques that we will cover in class.  Option 2:  find an existing analysis and replicate it, describing what the author did and exploring alternative approaches and diagnostics. You may use Python or R. The project will comprise 35% of your course grade.

Requirement

Due

Points

Memo 1: Topic and Data

Memo 2: Analysis Plan

March 3 April 5

15

20

Presentation Slides

April 21

20

Reection

April 28

10

White Paper & Replication Code

May 10

75

Project Total

140

Project Memos

You will submit two memos to Professor Brodnax detailing the progress on the project.

Memo 1: Topic & Data - Due March 3

This memo to Professor Brodnax will detail your progress on the nal project. Specifically, it will be a rst draft of the Background and Data sections of the white paper.

• Background Provide an overview of your research question and a brief literature review of the topic. Discuss the current methodological approaches as well as reasons why these may not be optimal. Potential areas to critique include: research questions, datasets,  units of analysis, techniques,  and interpretation of ndings.   Discuss the broader context: what social, political, and/or nancial factors make this an important question?

• Data Discuss the types of data that would be ideal for analyzing this question.          Identify at least one dataset that could be used to answer your research question. There are many sources of data,  including Data.gov,  ICPSR, Kaggle,  and the US Census Bureau.  You can also look for data on city- or state-level open data portals.

Note: If you are replicating an existing study, you do not need to use the same dataset from the study.

Discuss the limitations of the dataset(s) for answering your research question.

The dataset(s) must be publicly accessible.

Your bibliography must include a dataset citation with a web address.

– Each dataset must have at least 500 observations, not including time series. For example, suppose you find a survey dataset where the unit of analysis is household. There must be at least 500 individual households represented in the survey.  A survey with 150 households that were surveyed 5 times would not be acceptable, even though the total number of observations for all time series exceeds 500.

Your memo should be approximately 600 to 800 words and be in professional memo format. Your memo should be clearly written, with proper spelling and grammar.  You must also include in-text citations and a bibliography formatted in APA style.  The bibliography will not be included in the 600- to 800-word guideline.

Memo 2: Analysis Plan - Due April 5

This second memo to Professor Brodnax will detail your progress on the nal project. Specif- ically, it will be a second draft of the Data section and a rst draft of the Methodology section of the white paper. Be sure to incorporate any feedback from the rst memo.

• Background At this point your research question should be nalized.  Provide an update to the Background section if needed.

• Data At this point your data selection should be nalized.  Discuss your preliminary review of the data. When selecting what variables to include in your memo, consider only those needed to answer the question or guide interpretation of the results.

–  Create numerical summaries of relevant variables  (minimum of two).   If your dataset contains a large number of variables, select a subset of variables that you think are most relevant to your research question. If the variables are continuous, include a table of descriptive statistics and discuss how the data are distributed. If the variables are categorical, include frequency tabulations and discuss patterns across observations.

–  Create graphical summaries of relevant variables or relationships between vari- ables (minimum of two). Provide an interpretation of each plot. What key infor- mation or limitation does it reveal about the data with respect to your research question?

Your bibliography must include a dataset citation with a web address.

Each dataset must have at least 500 observations, not including time series.

Methodology Discuss your analytical plan.

Indicate which variables will comprise your target and features.

– You must select at least one parametric and at least one non-parametric technique. What techniques do you plan to use?  Give a justification for your choice based on the research question  (prediction vs inference) and the outcome of interest (numerical or categorical).

– Provide a brief explanation of the intuition behind each model’s algorithm. Pro- vide a brief literature review of each technique.   The literature review should include at least one example of how each technique was applied in a different setting (journal article, news article, blog, video, etc.).

Your memo should be approximately 1,200 to 1,500 words (including background material from Project Memo 1) and be in professional memo format.  Your memo should be clearly written, with proper spelling and grammar.  You must also include in-text citations and a bibliography formatted in APA style. The bibliography will not be included in the 1,200- to 1,500-word guideline.

Expectations

• Numerical values must be neatly formatted in tables (no screenshots). Numbers should have no more than three digits after the decimal place.

• Plots must include titles, axis labels, and element labels. Take care that axis values are neatly formatted and easy to interpret.  Plots should be easy to understand without outside information.

• Labels and descriptions within tables or plots should be brief and meaningful with proper spacing and capitalization (no variable names with underscores).

Presentation Slides - Due April 21

During the nal week of classes, the class will give project presentations in the format of a data science conference. The presentations will be organized into panels by topic, with each panel running approximately 30 minutes. Panels may be shorter or longer depending on the number of students presenting on that topic.

Each student will give a 4- to 5-minute presentation providing a concise summary of their project, including the following:

1. Research question and background

2. Data sources, target, and important features

3. Parametric and non-parametric techniques used

4. Evaluation and interpretation

5.  Conclusion and limitations

Reection - Due April 28

Following the presentations, you will write a brief reflection based on your attendance to at least three panels, including your own.  You may earn extra credit of up to 5 points by attending more than three panels and discussing them in your reflection.

White Paper & Replication Code - Due May 10

The final project must be uploaded to Canvas and must include (1) the white paper in PDF or Microsoft Word format; (2) all code used to generate the analyses, tables, and plots; and (3) all datasets as imported into Python or R scripts/ notebooks. Note: The white paper is not a memo and should not be submitted in memo format.  The white paper must include the following:

Project Title and Student Name

Executive Summary

Introduction

Provide an overview of your topic.

– If your project is a replication, discuss the authors’ methodological approach and discuss reasons why these may not be optimal. Potential areas to critique include: research questions, datasets, units of analysis, techniques, and interpretation of findings. Explain how your approach differs from current approaches.

Data

Discuss the types of data that would be ideal for analyzing this project.

– Provide a review of the datasets used for the analysis, including relevant numerical and graphical summaries.

Discuss the limitations of the datasets.

Methodology

– Discuss the intuition behind the technique(s) you are using. You must utilize at least one parametric technique and at least one non-parametric technique.

– Provide a literature review with examples of how these techniques were applied in a different setting (journal article, news article, blog, video, etc.).

– Discuss the strengths and weaknesses of each technique.

Findings

Discuss the details of your analysis, including estimation and evaluation.

– Depending on your model,  evaluation discussion can include hyper-parameter selection, training size selection, and performance (accuracy, cross-validation re- sults, confusion matrices, ROC/AUC, etc.)

– Provide an interpretation of the results of your analysis.  What insights did you generate from the analysis?

Conclusion

– Discuss the implications of your insights. How might these insights be applied in your topic domain?

Discuss limitations (ethical, computational, data, etc.) and future considerations.

Bibliography

The bibliography must be in APA format.

Implementation Appendix

– Use this section to discuss any relevant, interesting, or innovative aspects of your technical implementation.

– For example, did you scrape the web or use an API? Develop any new measures? Reshape and/or pre-process the data?


Rubric

Area                          Expectations


Submission

All components were submitted on time:   (1) report (PDF/Word),  (2) data, and (3) replication code. Final dataset includes at least 500 obser- vations.

Coverage

Report content comprises all components of the report outline above, in- cluding: an introduction of the problem and contextual background; de- scription of the datasets as well as numerical and visual summaries; expla- nation of at least two analytical techniques, including a review of examples from outside of class; interpretation of the analysis and evaluation; and discussion of implications and limitations. Report sections are clearly in- dicated via formatting.

Data

Each dataset includes at least 500 observations. Summaries are provided for all relevant variables and/or relationships (minimum of two). Only those summaries needed to answer the question or guide interpretation are included. Numerical values are neatly formatted in tables. The numbers displayed in tables have no more than three digits after the decimal place. Interpretations are provided for all tables

Graphics

Plots are provided for all relevant variables and/or relationships (mini- mum of two). Only those plots needed to answer the question or guide interpretation are included. The plots include titles, axis labels, and el- ement labels when needed. The labels and table headings are brief and meaningful. The plots are easy to understand without outside informa- tion. Interpretations are provided for all plots.

Techniques

Analysis includes at least two techniques covered in PPOL 565; at least one must be parametric and at least one must be non-parametric. Con- tent includes an explanation of each technique written for a non-technical audience. The discussion demonstrates understanding of technique and does not regurgitate documentation. Examples of technique applications do not use data or examples discussed in class.

Interpretation

The interpretation discusses key insights derived from the analysis, as well as an evaluation of techniques.