Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

EMET8005 Assignment #2

Instructions The assignment is due 12 noon on Tuesday 17 May 2021. Your report should be uploaded to Wattle using the Turnitin link provided. Late submissions will receive a mark of 0 unless an extension has been granted before the deadline, as per the course outline.

Your report must be all your own original work. Your report should be typed and the file should be in either Word or pdf format. Part of the assignment is to present results ‘professionally'. This means that there should be no Stata commands or Stata output in the main text. Extract the information you need from the Stata output, and create nice tables and figures similar to those you see in textbooks and journal articles. The Turnitin link has tabs for uploading the report, the do file and the log file separately The do file must be annotated with explanatory comments, so that it is clear what results are sought, and it must run without syntax errors (assuming the data file is in the current working directory).

Some of the questions are fairly broad and can be answered in different ways. Better answers show a higher level of understanding of how to analyse and interpret data. There is no strict word limit but, everything else equal, a clear and concise writing style will attract higher marks. We anticipate most reports will be between 600 and 1400 words (excluding tables).

If you have any questions about the assignment, please email [email protected]. There is no penalty for clarification questions.

Conditional cash transfers The idea of conditional cash transfer (CCT) programs is to reduce poverty and improve children's outcomes at the same time. Eligible poor families can receive cash payments on conditions that typically involve ensuring their children have regular health checks and regularly attend school.

Colombia implemented a new CCT program in 2001-2002, called Familias en Accion (FeA), where, among other things, eligible poor families could receive approximately US$15 per month if the children they registered for the program attended the health checks recommended by the government (see Table 1 below). The health checks were carried out by a nurse and involved measurements of the child's development, advice on nutrition, check of compliance with the vaccination schedule, etc. The only cost for parents was transportation to the health centre and the value of the time involved.

Table 1——Preventive Care Visits Schedule

Age in months

< 12

13-24

25-36

37-60

61-72

73-84

Number of visits

5

3

2

2

2

2

 

Note: The table shows the schedule of preventive care visits in FeA according to children age in months.

A special quirk of the implementation is that the conditioning only involved children who were aged 0-7 at the time of the family's registration date (FRD). That is, the family will receive payments if all children born before the FRD attend their health checks, while children born after the FRD do not matter for the payments. Over a four-year period, a total of 622 out of 1098 municipalities were deemed eligible. The implementation of the program was staggered, meaning that not all eligible municipalities participated from the beginning.

To evaluate whether the program achieved its objectives, a series of surveys were carried out in 57 of the 622 municipalities. The first survey wave took place during the summer of 2002, the second wave between July and November 2003, and the third wave took place between December 2005 and March 2006. At the time of the first survey, 26 of the 57 municipalities had implemented the program, while 31 were waiting to start.

The original study of the FeA program is published in a paper by Attanasio, Oppedisano, and Vera-Hernandez, Should Cash Transfers Be Conditional? Conditionality, Preventive Care, and Health Outcomes, American Economic Journal: Applied Economics 7(2), 35-52, 2015.

In this exercise, we look at just one of the program objectives and investigate whether children who are subject to the conditionality requirements of the program are taken to more preventive care visits than children who are not. Because the children born after the FDR are very young at the time of the last survey, we limit our analysis to children aged 0-36 months also before the FRD.

Preliminaries

• Download the file fea2022as02.dta from Wattle. The file has data from all three surveys. Each row corresponds to a child in a participating household. The labels in the dataset provide a brief explanation for each variable.

• To familiarise yourself with the data, verify the following are true:

(a) The 81 birth cohorts range from July 1999 to March 2006. These are coded as Stata dates 14426, 14457, ..., 16861 corresponding to 1 July 1999, 1 August 1999, ..., 1 March 2006; see the variable month_year_birth. Note there is also a variable age with age in years at the time of the interview.

(b) The 57 municipalities in the dataset are assigned unique ID codes like 5034, 5154, ..., 85125; see the variable cod_mpio.

(c) The 3 surveys are numbered 1, 2, and 3, corresponding to the summer of 2002, July-November 2003, and December 2005 to March 2006; see the variable survey.

(d) At the time of the first survey, only 26 of the 57 municipalities participated in FeA; see the variable tipo.

Using e.g. summarize, tabulate, or codebook, make sure you understand the basic characteristics of each variable (range, categories, missing values). Beware Stata dates are internally coded with 1 January 1960 being day 0; the date you see in listings is determined by the display format which is set to %dCYND; see help format and help datetime.

• Children are subject to the conditionality requirements if they are born before the FRD. Whether a child is born before or after the family-specific FRD is indicated by the variables beforefrd-ori and afterfrd_ori. Unfortunately, there are many missing values in the familyspecific FRD and hence in beforefrd-ori and afterfrd-ori. There may also be a selection issue associated with these two variables. For example, households that delay registration may be those would invest less in their children anyway. The original authors addressed these concerns by using the median registration date for the municipality instead of the family-specific date. The variables beforefrd and afterfrd indicates whether a child is born before or after the median FRD for the municipality. We will follow the original study and focus on the latter in the first part of this exercise.

• In this analysis, the basic observational entity is a child (as opposed to a family or a municipality), because within each family some children may be subject to conditionality requirements while others are not. The dataset constitutes a panel, because the same families were surveyed up to three times, and therefore there may be multiple observations for each child. We therefore subscript by t those variables whose values may change over time, where t indicates the survey wave. To examine the effect of program participation on attending preventive care visits, consider the model:

81 2

ncydt = aafterfrd + k 1(mont^year.birth = cohortk) + 】^31(aget = a)

k=1 a=0

57 3

+ ^ Xj 1(cod_mpio = muni") + ^ 6t 1(suweyt = t) (x)

j=1 t=i

57 3

+ 】E Wjt 1(cod_mpio = muni") x 1(su^v^eyt = t) + K + *‘Xt + Ut, j=1 t=1

where cohort1,..., cohort81 are the possible birth cohort values, munic1,..., munic57 are the possible municipality code values, and the Greek letters are parameters. We have omitted a subscript t on month_year_birth and cod_mpio, since these variables are time invariant. The variables included in the vector Xt will vary as explained below.

• To get the valid standard errors, you need to cluster at the municipality level in all regressions. This allows the error terms to be arbitrarily correlated within each municipality, as well as having different variances. This is important, because people in the same municipality share the same living environment and the same government services (common omitted variables). Instead of vce(robust), use vce(cluster cod_mpio).

• To avoid information overload, please round all estimates and standard errors to two digits after the decimal point. Probably only one digit is practically meaningful in this context, so reporting two digits is plenty.

Questions

(a) The main parameter of interest is a, but model (*) includes many control variables. Consider first a 'short' version that has no Xt variables but includes everything else in model (*).

Discuss why we might want to include dummies for cohort and age, municipality dummies, and survey dummies even in a 'short' model.

(b) Consider a 'long' version of model (*) with additional controls for child characteristics, parental characteristics, and family structure. Specifically Xt now includes the logarithm of birth order (birth_ordern); gender (female); maternal and paternal educational dummies (edu_h_b, edu_h_c, edu_m_b, edu_m_c); household size (famsize); number of siblings in the 0-6, 7-13, 14-17 age groups (sib0_6, sib7A3, sibl4_17); and an indicator for rural area residence (rural).

Discuss why we might want to compare a short and a long model in general, and speculate briefly on the motivation behind including these particular variables in the long model. If we could get more information, are there other variables you would like to include in the long model?

(c) Create a table with the estimates of a and their cluster-robust standard errors from the short and long models considered in parts (a) and (b).

Discuss the results. Are the differences between the short and the long models large or small? Are the magnitude of the estimates large or small compared with the overall average number of preventive health care visits a child receives? Are the estimates reasonably precise?

(d) Consider an extended long model that is the same as in part (b), but additionally includes in Xt the number of siblings born before the FRD (nbeforefrd) and the interaction term between being born after the FRD and the number of siblings being born before the FRD (afterfrd x nbeforefrd). (Treat nbeforefrd as a continuous variable.)

Discuss why the number of siblings who are born before the FRD may affect the number of visits a child born after the FRD receives. (Remember, the CCT depends only the children born before the FRD attending their check-ups.)

(e) Create a table with selected results from estimating the model in (d). It suffices to show the estimates of the coefficients of afterfrd, nbeforefrd, and the interaction term nbeforefrd x afterfrd, together with their cluster-robust standard errors.

Also, compute the predicted number of preventive health care visits for the 10 cases made up by setting afterfrd equal to 0 or 1 and setting nbeforefrd equal to 0, 1, 2, 3, or 4. You can compute these predictions taking all other variables at their overall means. Present your predictions in a table or a graph (preferably with standard errors or confidence intervals). (For example you can use the margins and the marginsplot commands.)

Interpret the parameter estimates and the predictions.

(f) As mentioned, we have two pairs of variables indicating whether children are born before or after the FRD. Neither pair is perfect, so it makes sense to try both and compare. Since there are many missing value in beforefrd_ori and afterfrd_ori, we begin by investigating the effect of using a smaller sample. That is, we re-estimate the two models in (c), still with beforefrd and afterfrd on the right-hand side, but using only the subsample of observations where afterfrd^ori is non-missing.

Create a table of selected estimates and discuss how the estimated effects of conditionality change. Do they change a lot, or just a little?

(g) Estimate the models in (c) using afterfrd-ori instead of afterfrd. Create a table as before.

Discuss whether the estimated effects of conditionality change a lot, or just a little.

(h) One way to address the potential endogeneity of afterfrd_ori is to use IV estimation with afterfrd as the instrument. Create a table as in part (g), but also add relevant information about the first stage.

Discuss the results. Do the first stage suggest the is instrument strong? Do the estimated second-stage effects of conditionality change a lot, or just a little? Compare the OLS and the IV estimates and discuss the direction of the selection bias.