Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Problem Set #3

This exercise continues our examination of the effect of TSPs pollution on infant mortality. Here we explore violations of the assumptions underlying the Gauss-Markov theorem that guarantees that OLS is BLUE.  This exercise will help (or perhaps force) you to practice the solutions to      these violations that we have discussed in class.

Once again, feel free to work cooperatively, but each person is required to turn in his/her own problem set that provides the solutions in his/her own words.

Data Source: imrtsp7172.dta

The unit of observation is the county and there are 462 observations of 19 variables. Each           observation records the change between 1972 and 1971 (i.e., the 1972 minus the 1971 value) for each of the variables. The lone exception is the tbirth variable that equals the sum of the 1972    and 1972 number of births. This variable should be used as a weight (in STATA language     this means w=tbirth) in ALL regressions in this exercise.

The relevant variables (with descriptions in quotations) are:

dimr7271 “# inf death per 1,000 births 72-71”

tbirth “total births 71 & 72”

dwhite “% births, white mom 72-71”

dothr “% births, nonwhite/nonblack mom 72-71

dfemale “% female births 72-71”

dedudad “father yrs of ed 72-71”

dedumom “mother yrs of ed 72-71”

dlwght “% births with weight<2,500 g 72-71”

dmaried “% mother married 72-71”

dunmard “% mother unmarried 72-71”

dagemom “mother age 72-71”

dpcare1 “% mom began month 1 or 2 72-71”

dpcare2 “% mom began 3rd month 72-71”

dpcare3 “% mom began 4-6th month 72-71”

dpcare4 “% mom began 7-9th month 72-71”

dpcinc county-level per cap income 72-71”

dmtspgm county-level tsps concen 72-71”

fstate “fips state code”

reg_tsp “=1 if county regulated for tsps

1.   Introduction to the New Data and Problems with the Residual

a.   Plot dimr7271 against dmtspgm. Does it look like there is an association between changes in infant mortality and tsps? Repeat this exercise where weight is set equal to tbirth. Now, does      there appear to be a relationship?

b.   Based on the scatterplots from part a., is there any evidence on                    homoskedasticity/ heteroskedasticity in the change infant mortality rate model?

c.   Suppose that there is heteroskedasticity in the residuals of the change imr regression. Is the OLS estimator of the effect of TSPs unbiased and consistent? Efficient? Is the conventional” estimator of the variance of the estimated effect of TSPs unbiased/consistent?

d.   Regress dimr7271 on dmtspgm. Is there evidence of a relationship here? Now add the complete set of control variables (i.e., dwhite dothr dfemale dedudad dedumom dmaried    dumard dagemom dlwght dpcare0 dpcare1 dpcare2 dpcare3 dpcare4 dpcinc). Have your    conclusions changed? What might explain the differences in these estimates and the cross-

sectional ones in the last problem set?

e.   Now repeat the complete set of controls regression, but use the “robust” subcommand in   STATA to calculate the White heteroskedastic consistent standard errors (reg dimr7271          dmtspgm dwhite dothr dpcare0 dpcare1 dpcare2 dpcare3 dpcare4 dpcinc [w=tbirth], robust).  Explain briefly how these estimates of the standard errors are corrected for heteroskedasticity. How do they compare to the “uncorrected” (conventional) LS estimates of the standard           errors? Is there evidence of heteroskedasticity?

f.   Using the predict” STATA command (predict [variable name], residual), save the residuals from the LS complete set of controls regression. Now regress these residuals on the complete set of controls. Explain why the R-squared and estimated coefficients from the regression are           virtually zero.

g.   Now let’s apply a more formal test for heteroskedasticity. Regress the squared values of the residuals from the LS complete set of controls regression on the complete set of controls. Use   the R2 to test for heteroskedasticity. What do you find? Now include the complete set of           controls and their squares.  Does this change your conclusions? Now apply White’s special test for heterskedasticity. Does this change your conclusions?

h.   Now suppose that someone (call her God) tells you that the var(ei) = c * dmtspgm. Is this evidence of heteroskedasticity? If so, what would you do to return to the Gauss-Markov        assumptions? What are the advantages of this approach relative to White standard errors? In practice, what are the potential problems with this approach?

2.  Bias and Inconsistency and a Potential Solution [NOTICE: We have not yet covered this material in lecture but will do so before the problem set is due].

a.   Suppose that dmtspgm is an imperfect measure of the change in TSPs due to misreporting. In addition, suppose that this measurement error is “classical” in the sense that it is                  independently and identically distributed. What does this imply about the bias in the estimated effect of dmtspgm from the regression that we have been examining in question 1?

b.   Suppose God has told us (or that we suspect) that there are many unmeasured/unobservable  confounding factors that determining both dimr7271 and dmtspgm. Examples of these types of  variables include: health insurance status, rates of smoking across mothers, and parents’ income. Explain how this could lead to “omitted variables” bias in the LS equation. Show this in the        derivation of the LS parameter estimate of the influence of TSPs on IMR.

c.   In 1970 the federal government passed the Clean Air Act Amendments. This legislation set air quality standards for TSPs that all counties are required to obtain. In counties that did not     meet the standards, TSP emitters were subject to harsh regulations that required them to reduce their emissions of TSPs, while emitters in “clean” counties were relatively free from regulation. Previous researchers collected the information for whether a county was regulated heavily and  coded it as a dummy variable that is equal to 1 for heavily regulated counties and 0 otherwise.   The variable name is reg_tsp. Under what conditions would this variable qualify as a valid        instrumental variable?

d.   Present evidence that reg_tsp might be a valid instrumental variable. In particular, test          whether TSPs declined more in heavily regulated counties. What was the effect of the Clean Air Act Amendments on TSP concentrations? Can you devise any suggestive evidence that reg_tsp meets the other condition required of an instrument? (Hint: Recall how Woodbury and               Spiegelman showed that randomization was effective)

e.   Now run the following STATA command: ivreg dimr7271 (dmtspgm = reg_tsp). This performs two-stage least squares estimation (which is equivalent to instrumental variables estimation here) of the effect of changes in TSPs on changes in IMRs, without controlling for  any other explanatory variables. {Read the STATA manual to make sure that you                understand this command}. What do the results suggest? What must have happened to infant mortality rates in heavily regulated counties?

f.   Under the assumption that reg_tsp is a valid instrument, is it possible to use the Hausman test to determine whether OLS is biased? What do you think this test will reveal? Now implement    the Hausman test. (Recall this is done in two steps. Use tbirth as a weight in both steps. Also       continue to use the bivariate regression). Interpret the results of the 2nd stage of this testing          procedure. How does the parameter estimate on dmtspgm compare with the parameter estimate   from the instrumental variables approach? Discuss the parameter estimate on the residual from   the 1st stage.  Can it be used to sign the “omitted-variables” bias?

g.   Write down the structural equation of interest that has been implied in e. and f. What are the two reduced-form equations? Is the structural equation overidentified, exactly identified, or       underidentified? Why? Estimate the two reduced-form equations (remember to use tbirth as a    weight in the estimation of both of them). Discuss the parameter estimates obtained in both of   these equations (e.g., how did imrs and tsps change in heavily regulated counties relative to less regulated ones?). What does the ratio of these two parameter estimates equal? What name is      given to this approach?

h.   Now estimate: ivreg dimr7271 (dmtspgm  = reg_tsp) dwhite dothr dfemale dedudad               dedumom dmaried dumard dagemom dlwght dpcare0 dpcare1 dpcare2 dpcare3 dpcare4 dpcinc   [w=tbirth]. Does the IV parameter estimate change by a “meaningful” amount when all the other control variables are included?  Is this informative about the validity of reg_tsp as an instrument?

i.    Do you think that TSPs affect infant mortality rates? Has your opinion changed since the last problem set?  Can we conclude that TSPs cause infant mortality?  Why or why not?