Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Problem Set #2

This exercise examines methods of summarizing the relationship between two variables: a simple graphical analysis, the bivariate linear regression model, and the multivariate linear regression model. The application is to the relationship between infant mortality rates (IMRs) and total suspended particulates (TSPs) air pollution. The Environmental Protection Agency recently toughened the regulations that limit firms’ ability to emit TSPs, because of the presumed health effects of TSPs. Whether or not, IMRs and TSPs are causally related is an issue of tremendous importance to public policy.

Feel free to work cooperatively but each person is required to turn in their own problem set that provides the solutions in their own words. Late problem sets will not be accepted.

Data Source: imrtsp71.dta and imrtsp72.dta

imrtsp71.dta is a data file from 1971. The unit of observation is the county and there are 715 observations of 21 variables.

This Stata format data file contains county-level information on county-level number of infant mortalities per 1000 births (IMR), the ln of this same number, TSPs concentrations, number of births, characteristics of new parents (e.g. race of mother, years of education, marital status of mother, mother’s age), whether the infant is considered to have a “low-birth weight (a poor indicator of infant health), month of the pregnancy that the mother initiated prenatal care, and mean per-capita income.

The relevant variables with descriptions in quotations are:

imr71 " # inf deaths per 1000 births 71"

lnimr71 "ln(# inf death per 1000 births 71)"

mtspar71 "county-level tsps concentration, measured in micrograms per cubic meter 71"

tsp_sq “the square of mtspar71”

birth71 "# births 71"

white71 "% births, white mom 71"

othr71 "% births, nonwhite/nonblack mom 71"

female71 "% female births 71"

edudad71 "mean father years of ed 71"

edumom71 "mean mother years of ed 71"

maried71 "% mother married 71"

umard71 "% mother unmarried 71"

agemom71 "mean mother age 71"

lwght71 "% births with weight<2500 g 71"

pcare171 "% mother began prenatal care in 1st or 2nd month 71"

pcare271 "% mother began prenatal care in 3rd month 71"

pcare371 "% mother began prenatal care in 4th-6th month 71"

pcare471 "% mother began prenatal care in 7th-9th month 71"

pcinc71 "county-level per cap income 71"

location "5-digit county fips code"

fstate "2 digit state fips code";

[Note: There may be a few extra variables in the data file, but you should ignore them.]

imrtsp72.dta is structured exactly the same way except that the observations are from 1972 and all the appropriate variable names end with “72” instead of “71”. Again, the unit of observation is the county and here there are 983 observations of 22 variables.

1. The multivariate linear infant mortality rate regression model.

a. Now regress imr71 on a constant, mtspar71 and all the other variables. Briefly interpret the meaning of the parameters on mtspar71, lwght71, the prenatal care variables (i.e., pcare171, pcare271, pcare371, and pcare471), edumom71, and edudad71. By how many units would TSPs have to increase in order to have the same magnitude effect on imr71 as an increase of .01 in the lwght71 variable? Is this a large difference in TSPs? Is low birthweight status (i.e., a birth weight less than 2500 grams) or TSPs concentrations a more important predictor of infant mortality status in this regression?

b. Compare the TSPs parameter estimate in this regression and the bivariate regression. Are they different? What might this imply about how TSPs is distributed across the country? Can you think of variables that we have not controlled for that may be related to both TSPs and infant mortality rates? Does this affect your interpretation of the TSPs parameter estimate?

c. Devise a test of the null hypothesis that all the explanatory variables besides TSPs are equal to zero. What are the results? At what confidence level can this null be rejected? Now do the same test on all the variables except TSPs and lwght71? What are the results of this test?

d. In lecture, we showed that the parameter estimate on any variable from a multivariate regression is interpreted as the effect of that variable, holding constant all other variables. The goal of this problem is to show that this is the case for the TSPs variable. Begin by “purging” the TSPs variable of any covariance with the other explanatory variables and then regressing imr71 on the “purged” TSPs variable. Does this two-step regression technique produce the same parameter estimate on TSPs as the multivariate regression?

2. Compare the Regression Results from 1971 and 1972

a. Repeat 3 a and 3 e in problem set #1 with the 1972 data (i.e., imrtsp72.dta).

b. Repeat 1 a in problem set #2 with the 1972 data.

c. Compare and contrast the 1971 and 1972 results. Do the 1972 results cause you to change your interpretation of the 1971 results?

3. Derive the sampling distribution for beta in a multivariate regression both with and without knowing the variance of error term.

4. Derive the test statistic associated with the null hypothesis that β2 + β3 = 1.