Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

PUBPOL 5750
Causal Analysis and Impact Evaluation in Public Policy
PROBLEM SET 3
Due 11:59pm, Thursday March 26 via Canvas
For all questions with graphs, only answer with all of your Stata code, do not copy and paste graphs unless this is explicitly asked for.

1. For this problem, use our CPS data CPS-ASEC-2017.dta that has information on earnings and demographics.

a. (15 points) Create a variable that is the natural logarithm of hourly earnings, LNWAGE. Then regress LNWAGE on age, a dummy variable for sex (either “male” or “female”) and years of education. Circle the best interpretation of the regression below.

A. A one year increase in age is predicted to increase hourly wage by 1.18%. Females are predicted to earn 26.62% less than males. A one year increase in education is predicted to increase hourly wage by 10.49%.

B. A one year increase in age is predicted to increase hourly wage by 1.18 dollars. Females are predicted to earn 26.62 less dollars than males. A one year increase in education is predicted to increase hourly wage by 10.49 dollars.

C. A one year increase in age is predicted to increase hourly wage by 1.18%, holding constant gender and years of education. Females are predicted to earn 26.62% less than males, holding constant age and years of education. A one year increase in education is predicted to increase hourly wage by 10.49%, holding constant age and gender.

D. A one year increase in age is predicted to increase hourly wage by 1.18 dollars, holding constant gender and years of education. Females are predicted to earn 26.62 less dollars than males, holding constant age and years of education. A one year increase in education is predicted to increase hourly wage by 10.49 dollars, holding constant age and gender.

b. (15 points) Let’s focus on young workers. Run the same regression in (a), restricting to individuals aged 25-30. How do the results compare to the regression that is estimated over the full age range? (Describe and interpret the differences in the coefficients for age, education, and the gender dummy.)

Coefficient Interpretations (use the interpretation you chose in part a as a template):

Differences:

c. (20 points) A common pattern in labor economics is that there is a nonlinear relationship between age and LNWAGE. To allow for this pattern, create a new variable AGESQUARED which is AGE2 . Then (on the full set of data for all ages) regress LNWAGE on AGE, AGESQUARED, years of schooling, and the gender dummy. How can we interpret the coefficients on AGE and AGESQUARED? What is the predicted impact of a year’s age on earnings, at age 28? What about at age 40? At what age is the predicted impact of year’s age equal to zero?
Age 28 Interpretation (3):
Age 40 Interpretation (2):
AGESQUARED Interpretation (5):
AGE Interpretation (5):
Stata Code (3): 

2. Multiple dummy variables

For this problem we will continue to work with the dataset from Problem 3. We want to learn how wages vary across the US regions.
a. (5 points) We have a variable “statefip”, but not a variable that gives us the region. Statefip is
a numerically encoded categorical label. Try “describe statefip”. The storage type of “byte” tells us that the data is recorded as a number. The “value label” tells us that STATA has “labels” associated with each number. Tabulate statefip, to see how the state labels are presented. Then try “tabulate statefip , nolabel”. The “nolabel” option will just list out the numbers. Which number corresponds to New York State?
b. (15 points) To “add” in the information on “region”, we will need to merge in data from another dataset. I have uploaded a dataset “states.dta”. That dataset has one observation per state, and has the state name, two-letter symbol, the State FIPS code, and some region codes. Before opening that dataset up, let’s prepare the current dataset to merge in. First, sort the data by statefip. Next, save the dataset under a new name, for example: “save ps5p4.dta , replace”.

Now, load up states.dta, and list out the data to get a sense of what is in here. We will only need the variables stfips and region4. The “region4” variable groups all the states into one of

4 regions: Northeast (region4=1), Midwest (region4=2), South (region4=3) and West (region4=4). Keep only these two variables. To be able to merge the data, the state fips
variable will have to have the same name. Make this happen with “rename stfips statefip”.

Next sort the data on statefip. Then merge the data together with “merge 1:m statefip using ps5p4.dta”. Be sure to check that the merge was successful before proceeding!

At what age is predicted impact 0? (2): 

c. (15 points) Now, create 4 regional dummy variables, based on the region4 data. Regress hourly wage (not log wage) on three of these dummy variables. Choose the 3 variables that will make the Northeast the “reference group”. Interpret the coefficients. What does the estimated intercept tell us?
Interpretations:
d. (15 points) What happens if you try to include all 4 of the dummy variables in the regression?