Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ECON 121, Applied Econometrics and Data Analysis

Summer 2022

PROBLEM SET 3: EFFECTS OF HEAD START

Instructions:

• Use the provided pset3_submission.R” template file to complete this assignment.  Do not modify the file name for your submission. The autograder requires this filename to grade your assignment.

• Use the setwd()” command to read in the datafiles locally.   Comment out the “setwd()” command before you submit to Gradescope. Do not modify the provided code in the template file that loads the data. This will cause an error with the autograder.

• Only use the packages loaded in pset3_submission.R” when executing the tasks for the problem set. The autograder is only configured to use these packages and may not work if you use others.

Problem Set:

Head Start is an early-childhood development program run by the US federal government.  It provides health, nutrition, and education services to children from disadvantaged backgrounds.

The dataset https://github.com/credpath/econ121/raw/main/nlsy_deming.rda contains a sample of chil-

dren of NLSY 79 participants, some of whom participated in Head Start. All of the sample children have at least one sibling also in the sample. The variables are ordered as follows:

•  head  start - sibdiff relate to head start participation.

•  mom_id - lnbw were determined prior to Head Start participation.   (Note:  the PPVT is an early- childhood cognitive test.)

•  comp_score_ 5to6 - comp_score_ 11to14 correspond to test scores in childhood.

•  repeat - fphealth deal with outcomes in the teenage years and young adulthood.

For various reasons, some variables have missing data. Let’s ignore that for now.

1. Summarize the data. Create a new variable called rep that is identical to repeat. Write nlsy  <- nlsy %>% mutate(rep =  ‘repeat‘).   (Repeat is a function in R that cannot be overwritten, so this will make R happy.)  Create an object called nlsy_summary that contains group-level means of specific variables. Use the following code:

nlsy_summary  <-

nlsy %>%

group_by(head_start) %>%

summarize(

...

)

Instead of ...  replace the following variables with their means:  black, lninc_0to3, momed, rep, and somecoll. What can you say about the backgrounds of children who participated in Head Start relative to those who did not? (Answer in code and writing.)

2. Let’s compare the POLS, RE, and FE univariate associations between Head Start participation and age 5-6 test scores using the plm command.

(a) As a first step, estimate the association between Head Start participation and age 5-6 test scores using pooled OLS. Assign the output to an object called reg_pols.   Make sure you estimate standard errors correctly.  (Write coeftest(reg_pols, vcov =  function(x) vcovG(x, type = “sss”,  cluster =  “group”)) to see the correct output.) If we assume Head Start participation is

exogenous, what can we conclude about the effects of Head Start on test scores? Be sure to explain the magnitude of the estimated effect. Is it reasonable to assume that Head Start participation is exogenous? (Answer in code and writing.)

(b) Now estimate the same association using a random effects model (with mother random effects). Assign the output to an object called reg_re.  How do the results compare with OLS? Does the

comparison make you more or less confident that OLS or random effects can shed light on the causal effect of Head Start on test scores?  (Hint: in the absence of family-level omitted variables, OLS and random effects are both unbiased estimators, so they should be similar.) (Answer in code and writing.)

(c) Now estimate a mother fixed effects model. Assign the output to an object called reg_fe. What do the results imply about the effects of Head Start on test scores? If the fixed effects results are different from those in your answer from question (2b), explain why. (Answer in code and writing.)

3. Now let’s estimate mother fixed effects models of the association between Head Start participation  and age 5-6 test scores using the feols command.  Run three fixed effects regressions:  a “univariate” regression (call it reg1), a regression with all pre-Head Start control variables other than PPVT (call  it reg2), and another univariate” regression but on the subset of data that has no missing values for  pre-Head Start control variables (call it reg3). (Hint: check the introduction to the problem set to see  when variables are determined. Do not use PPVT.) Which control variables can you include in the fixed  effects regression, and which can’t you include? Why? (Answer in code and writing.)

4. Some advocates for early-childhood education suggest that the effects of programs like Head Start are

long-lasting. Carry out fixed effects analyses (without control variables like reg1) of test scores at later ages. Assign the outputs to objects called reg_5to6, reg_7to10, and reg_11to14. Does Head Start participation have similar effects on test scores in later childhood, or do the effects fade out with age?

To make the test scores comparable across ages, standardize them by dividing them by their standard

deviations.   (Hint:  write mutate(std_score_ =  comp_score_  /  sd(comp_score_ , na .rm = TRUE)) for each score.) (Answer in code and writing.)

5. Estimate similar fixed effects models of the effect of Head Start but on longer-term outcomes besides test scores.  (Hint: Check the introduction of the problem set to see when variables are determined. Ignore HS2_FE90.)  Assign each regression output to an object called reg_varname, where varname is the name of the outcome variable.  Many of these outcomes are binary, but use linear models.  Interpret your results. (Answer in code and writing.)

6. Do the effects of Head Start participation on longer-term outcomes vary by race/ethnicity?  By sex? Re-run the same six regressions from question (5) but with three interaction terms and their base levels. Assign the outputs to objects called reg_varname_heterogeneity, where varname is the name of

the outcome variable. (Answer in code and writing.)

7. The Biden administration advocates expanding federal funding for early-childhood education programs, while the Trump administration argued for cuts.  Based on your results, which position seems better

supported by evidence? Would you feel comfortable using your results to predict the effects of such an expansion? Why or why not? (Answer in words only.)