Hello, dear friend, you can consult us at any timeif you have any questions, add WeChat: daixieit


PUBPOL 5750 – Group Empirical Project

Spring 2026

You will form groups of 3/4 students to come up with your own substantive research question and try to answer it with real data and apply the econometric methods discussed in class. The question you choose will need to be feasible, policy-relevant and, more importantly, it should be a question you care about! After your group is determined, please give a cool name to your group! When I discuss your projects, I will refer to your group names.

Grading distribution and Deadlines

Overall, the project will make up 20% of your final class grade. Your group must submit 2 written reports over the course of the semester. The first submission is to ensure your project is moving forward smoothly and you are on the right track. The first submission (proposal and data description) will not be graded but it will tell me whether you are on the right track. I will check this submission and communicate with your group on how to improve your paper. The due dates are:

Submission 1: Project Proposal and Data
March 20
0% of final grade
Submission 2: Final Report
May 8
20% of final grade

You must submit your: GROUP NAME, GROUP MEMBERS AND A TENTATIVE TITLE by Thursday February 26 via Canvas. This is for my record keeping to track your progress as a group.

Submission 1: Project Proposal and Data

You should write two to four paragraphs (approximately one page) about the question you hope to answer. Describe why this question is interesting? What is your hypothesis about what you think the answer might be and why? Some examples of feasible questions are:

- How does family paid leave affect unemployment?
- How do characteristics of owners influence revenues of a firm?
- How does formal education affect people’s knowledge about current events?
- What kinds of families own multiple cars?
- Does smoking affect grades in high school?
- What characteristics determine whether a movie will win an Oscar?
- How do NBA player’s performance influence their salaries?

Your research question should cover the following points:

1. Research Question: you want to explicitly state a research question as formulated in the examples above. To get ideas, you might want to think of state policies that vary between states to examine the consequences of those policies. Another approach is having a look at different publicly available datasets to see the variables available and get inspiration from there. 

2. General data description: provide details of your dataset(s). These data should be appropriate for the research question you are trying to answer. The data should also be publicly available and easily accessible. For this part you are not required to provide any analysis just yet but you should provide your descriptive analysis (descriptive tables and figures).

You are not allowed to conduct any experiments or collect your own survey. You must use publicly available data that already exists.

You should generally describe the dataset: is it a cross-section? What is the unit of observation? (i.e., individual level, household level, firms, counties, states etc).

3. Population of interest: You might want to focus your study on only certain groups of people, or geographic areas, etc. This can be only a subset of the respondents covered by the dataset. Be clear about what your population of interest is.
4. Confounders and selection: Are there variables that might influence both your independent and dependent variables and thus induce a correlation? e.g., People with high IQs might be more likely to get more formal education and also keep up better with current events. People from poor families might be more likely to smoke and also receive poor grades in school. You will need to account for these potential confounding variables in your analysis.

You are able to modify or change your research question later on if you need to. You submitting the first report is important as you will receive feedback from me.

Dataset: IPUMS is a great data source. Plenty of different data sets, publicly available. Please explore the site and see what is available as soon as you form your group.

Data Description

Create your sample from the raw data. What are the relevant variables for your analysis? You should have about 10 to 15 variables. How many observations do you have for each? Make sure to have at least 100 observations (ideally more). Describe each variable’s type (categorical, numerical), and distributions. Are your continuous variables normally distributed? At least one variable should be a dummy variable (takes values of 0 or 1). Describe the distribution of one of your continuous variables conditional on one of your indicators being zero and being one. Do not show histograms for categorical variables. Talk about the variables themselves rather than the name or number they might have in the survey (e.g., do not refer to a variable as q325, but rather refer to it as income or whatever the variable is about).

Your first report should be primarily a written discussion of motivation, research question, empirical plan to answer your question. It should also include descriptive summary of your data, along with supporting descriptive tables and figures. It should not be more than 5 pages in total. Your tables should look pretty (like those presented in research papers) and not copy-pasted from STATA.In addition to your report, you should create and hand in a single STATA .do file that performs all the descriptive analysis.

Submission 2: Final Paper

Use an appropriate combination of multiple regression models and (if appropriate) more advanced methods to answer your research question. You should estimate regression models with one to three different dependent variables and control for potential confounding factors. Carefully interpret the signs and magnitudes of coefficients in your preferred regression specifications. If there are other potential confounders that you do not have in your data, discuss how omitting them could bias your results.

In your final submission:
- Include the title (Group name, your names), research question, and motivation you developed earlier
- Include the equation for your preferred regression model specification
- Include tables of regression results (including coefficients and standard errors) in your final report.
- The maximum length for your final paper is 12 single spaced pages, including tables and figures. You will likely produce more during your research process, but it is important to know how to identify the most important results.

In addition to your final submission, you should hand in a single .do file that performs all of your tests and regression analysis.

The grade for your term paper will follow the following rubric (100%):

Overall exposition (10 points).

Regression models and Presentation of Results (40 points): [at least 3] Dependent variable makes sense? (5pts), “Main” independent variables make sense? (5pts), Include potential confounders? hope so! (10pts), Interpret signs and magnitudes (10pts), Interpret statistical significance (10pts).

More advanced methods and discussions (10 points): Creative approaches and methods are welcome. Attempts to establish causality will be graded favorably.

Discussion/Conclusion (40 points): Main take away (10pts), Limitations of your empirical approach (10pts), Proposed hypothetical approach to identify causal impact (10pts), Policy relevance (10pts).

Note: Well-written papers with pretty tables and figures will be graded more favorably. Do not rely on AI for your paper. It does such a bad job that it is impossible for an experienced empirical researcher to miss. Use your own words and enjoy writing a joint policy paper!

SINCE THIS IS A CAUSAL IMPCAT EVALUATION COURSE, PAPERS THAT AIM TO ESTIMATE CAUSAL EFFECTS WILL BE GRADED MORE FAVORABLY!