PUBPOL 5750 Causal Analysis and Impact Evaluation in Public Policy PROBLEM SET 2
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
PUBPOL 5750
Causal Analysis and Impact Evaluation in Public Policy
PROBLEM SET 2
Spring 2026
Due 11:59pm, Thursday March 12
1. Taking a Sample of data, examining how estimates compare with the “population” In this problem we will treat the dataset CPS-ASEC-2017.dta as ifit were the population of interest, and imagine what we might estimate if we only had access to small samples from that dataset. In this exercise assume error term is homoscedastic.
a. (8 points) First, run a regression of WAGE on EDUCATION.
|
Stata Code: |
b. (8 points) Next, create a dataset which has only the following two variables:
WAGE and EDUCATION. (use STATA’s “keep” command.) Then save this dataset with its own name. For example, I might “save ps2prob1.dta , replace”
|
Stata Code: |
c. (10 points) Next load up the data and take a random sample of 50 observations. You can do this with the commands:
i. “use ps2prob1.dta , clear”
ii. “bsample 50”
Summarize the new dataset to confirm you have 50 observations. Run a regression of WAGE on education. How do the results compare to the regression from the main sample in a.?
|
Stata Code: |
d. (10 points) Next, do step 1.c three more times. Each time you draw a different sample of data, and get different regression output. Use this output to fill the in the rows corresponding to “Sample 1” through “sample 4” of the Table below. (Fill in the “population” slope row with results from problem 1.c):
|
|
N |
Population Slope |
|
|
|
Population |
61,305 |
|
|
|
|
Sample |
N |
Estimated Slope |
Estimated Std. Error of Slope |
95% CI for slope |
|
1 |
50 |
|
|
|
|
2 |
50 |
|
|
|
|
3 |
50 |
|
|
|
|
4 |
50 |
|
|
|
|
5 |
200 |
|
|
|
|
6 |
200 |
|
|
|
|
7 |
200 |
|
|
|
|
8 |
200 |
|
|
|
e. (8 points) Next, take 4 samples of size 200, in each sample run the same
regression (using education to predict WAGE), and fill in the remaining rows of the table.
f. (8 points) For the N=50 samples, How do the “estimated” slopes compare to the “true” population slope? How much variability do they have? (give a statistical measure of variability)
g. (8 points) Answer the same questions for the N=200 samples. Additionally, explain how the N=200 samples compare to the N=50 samples?
2. Dummy variables in bivariate regressions. For this problem, return to the main
dataset CPS-ASEC-2017.dta. For this question assume the error term is homoscedastic.
a. (5 points) Summarize the dummy variable for FEMALE. Describe what the variable means and what the average tells us.
|
Stata Code: |
b. (5 points) Regress wage on this dummy. What do the regression results tell us?
|
Stata Code: |
c. (5 points) Write down the “population line” model that corresponds to this
regression. What must be true of,! (the “slope” on FEMALE) in this model if men and women are paid the same? Test this hypothesis. Clearly state your hypothesis, show your steps, and clearly state and interpret your conclusion.
(You can implement the test however you would like.)
Model:
What must be true of β2:
Research Hypothesis:
Decision (including the numbers you used to come to your conclusion):
d. (5 points) Use the regression output to determine the mean wages for men. Check this against a direct measure of the average.
|
Stata Code: |
e. (5 points) Use the regression output to determine the mean wages for women. Check this against a direct measure of the average.
|
Stata Code: |
f. (5 points) What does the regression let us do, that we could not easily do with simple summarize commands?
g. (5 points) Some researchers suppose that the observed differences are due to
differences in education, age, etc. Select a subsample of those with exactly a
BA/BS degree, and who are aged 25-35. (use STATA’s “keep if … ” command.) How does the wage gap in this sample compare with the overall wage gap?
|
Stata Code: |
h. (5 points) In part (g) we were able to “hold constant” education and age by
restricting our sample based on these two variables. What other variables would you like to “hold constant” when examining the differences in wages for men and women?
2026-03-10