Econ 184b Solutions to the final exam practice problems
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Econ 184b
Solutions to the final exam practice problems
Part I: Multiple choice
1. c
2. b
3. a
4. c
5. a
6. a
7. a
8. a
9. b
10. a
Part II: True/False/Uncertain
1. False. Since this is a probit model, the coefficient estimate must be transformed using the standard normal cumulative distribution function in order to interpret the effect. If β1 = 0.10, one would calculate the probability as Φ(β0 + 0.10) and look up the corresponding probability in the CDF table. This would give you the difference in probability of enrollment for male students compared with female students.
2. False. A logit model is used in the case of a binary dependent variable, and is unrelated to omitted variable bias. Instrumental variables strategies, or sometimes panel data, are used to correct for omitted variable bias.
3. True. To implement an instrumental variables estimation one must have at least one instrument for each endogenous regressor in the regression; this is ‘exact’ identification. One can also estimate an IV regression if it is overidentified, i.e. there are more instruments than endogenous variables. One cannot estimate an IV regression if it is underidentified (fewer instruments than endogenous regressors.)
4. True. The test statistic is distributed and the critical values are 6.63 and 3.84 at the 1% and 5% significance levels, respectively [given in a table on the exam]. Since the J-statistic is 2.58, this is less than both critical values. Hence you cannot reject the null hypothesis that both the instruments are exogenous; in other words, your instruments appear to be exogenous and are therefore valid instruments.
5. False. The rule-of-thumb for checking for weak instruments is that the first-stage F-statistic must be greater than 10. This indicates a strong correlation between the instrument(s) and the endogenous regressor.
6. True. The conditional mean independence assumption only requires that the conditional mean of the error term does not depend on the treatment variable (the variable of interest) X; the error term can depend on the control variables W1i, …, Wri . This is a less stringent assumption than requiring that the error term be uncorrelated with all ofthe regressors in a regression.
7. True. We use MSPE = variance + bias2 to assess prediction accuracy, which shows the tradeoff between variance and bias.2 A shrinkage estimator (such as Ridge or Lasso) shrinks the estimator toward zero which introduces bias but also reduces variance. The variance ofthe estimator can be reduced by enough to more than compensate for the increase in bias,2 reducing the overall MSPE.
Part III: Short answer problems
1. a. Change in treatment group (NJ): + 0.59, change in control group (PA): - 2.16,
β1diffs-in-diffs = .59 – (-2.16) = 2.75. Standard economic theory suggests a negative, not positive, change since an increase in the minimum wage should reduce employment, all else equal.
b. The overall change of 2.75 is primarily due to the change in Eastern Pennsylvania (2.16), i.e., the control group. Following standard economic theory, if employment fell in Eastern Pennsylvania, then you would expect employment in New Jersey to fall by even more than in Eastern Pennsylvania. Not only did employment in New Jersey not fall by less, it actually increased.
c. The t-statistic is 2.75/1.36 = 2.02, thereby making the coefficient statistically significant at the 5% level (two-sided test). (Again note that this is surprising because one would expect a negative and statistically significant effect ofthe minimum wage on employment; as discussed in the box on p. 491 of the textbook, this is why this study was well-publicized and controversial.)
d. Since in some applications, the assumption E(ui | Xi , W1i ,..., Wri) = 0 is not likely to hold, the differences-in-differences estimator will not be unbiased or consistent. However, the differences-in-differences estimator will be unbiased and consistent under the weaker assumption of conditional mean independence. Including the additional characteristics (Wvariables) also
can improve efficiency (reduce standard errors). Furthermore, adding these variables allows the researcher to perform tests for randomization, since Xi should be uncorrelated with the W variables.
Answers will vary by student, but some additional factors one would want to control for include the state unemployment rate, the region the restaurant is located in, and characteristics ofthe restaurants (e.g. whether it is a chain, whether its workers are unionized, etc.).
2. a. No. This is a problem of simultaneous causality: we don’t know if we are estimating demand or supply or a combination ofthe two. The error term is correlated with the price because ofthe supply relationship so the coefficient will be biased and inconsistent.
b. Instrument validity has two components, instrument relevance ( corr(Zi , Xi) ≠ 0), and instrument exogeneity ( corr(Zi , ui) = 0). Wind speed is likely to be a good instrument because it is correlated with the price of fish (it is easier to catch fish if it is not stormy, so the quantity of fish is higher and the price lower); this satisfies the instrument relevance criterion. Wind speed also satisfies the exogeneity requirement, since it is not correlated with the error term. The error term comes from the demand equation, and there is no reason to think there is a relationship between the demand for fish and wind speed. In other words, it is not likely that people want to eat more fish on windy days.
c. The economist cannot test ifthe instrument is exogenous, because the equation is exactly identified, i.e. there is only one instrument (wind speed) for one endogenous variable (price of fish). She can test if the instrument is weak by looking at the F-statistic on the first-stage regression, i.e. the F-statistic testing the hypothesis that the coefficients on the instrument equal zero in the first stage. If the first-stage F-statistic is less than 10, the instruments are weak. If the instruments are weak, the TSLS estimator is not reliable, since it can be biased ifthe instruments are weak enough (even if they are exogenous).
3. a. About 1.06 % of applicants are late. To find the fraction of white applicants that are late, note:
Pr(white and late) = Pr(white)*Pr(late|white)
0.00856 = 0.845*Pr(late|white)
Pr(late|white) is about 1.01%
b. Recall that when a variable has been omitted, we can predict the direction of the bias from the product ofthe correlation between the omitted variable and the included regressor, and between the omitted variable and the dependent variable. We know from regression 3 that the sign on white is positive, so white is positively correlated with approve. Since the coefficient on ltotinc is much larger in regression 2 than in regression 3, this implies that ltotinc is biased upward
when white is omitted from the regression. The positive bias comes from the sign of the product of corr(white, approve) and corr(white, ltotinc) so the latter correlation must be positive.
c. A one percent increase in income raises the probability of being approved for a loan by approximately 0.018 percentage points, holding previous payment behavior and race constant (equivalently, a 100% increase in income raises the probability by 1.8 percentage points). In regression 2, the variables white and late were not controlled for, so the interpretation is not “holding all else constant” with respect to late and white.
d. The coefficient on late implies that, when white=0 (in other words, for non-whites), a late mortgage payment in the past reduces the probability of getting a loan by 49.7 percentage points. The coefficient on whitelate implies that the difference in the effect of late on getting a loan is 40.8 percent points higher (closer to zero) for whites compared to non-whites. In other words, whites are penalized much less for late mortgage payments than non-whites.
e. The regression is trying to determine ifthe impact of late mortgage payments on getting a loan is different for whites and non-whites. The results indicate that the effects are statistically different for whites and non-whites (assuming the t-tests are approximately valid even though we are using the linear probability model without robust standard errors). These results may bolster the claim of discrimination by race in lending markets.
4. a. The fact that children born after an arbitrary cutoff date are eligible while children born before the cutoff date are not creates a useful “natural experiment.” When evaluating the effect ofthe legislation on health insurance coverage, a researcher might be concerned that other factors besides the legislation could affect health insurance coverage. But the targeting ofthe program creates a “treatment group” and a “control group” which plausibly only differ in eligibility status, which can be compared to determine the effect ofthe program.
b. Coveredi = β0 + β1Poori + β2Afteri + β3Poori×Afteri + ui where Covered = 1 if child i is covered by Medicaid, 0 otherwise Poor = 1 if child i is poor, 0 otherwise
After = 1 if child i is born after the cutoff (October 1, 1983), 0 otherwise
Poori×Afteri = 1 if child i is both poor and born after the cutoff, 0 otherwise (this is the interaction ofpoor and after).
c. From the table in the paper we are given: Near-poor, Before = 8.7 which is β0
Near-poor, After = 16.9 which is β0 + β2
So β2 = 16.9 – 8.7 = 8,2
We also have:
Poor, After = 40.6 which is β0 + β1 + β2 + β3
Poor, Before = 18.1 which is β0 + β1
So β2 + β3 = 40.6 – 18.1 = 22.5
And β3 = 22.5 – 8.2 = 14.3 (this can also be seen directly from the table as the D-in-D estimate)
And finally, we can calculate β1 as (β0 + β1 ) - β0 = 18.1 – 8.7 = 9.4
d. The difference-in-differences estimate says that the policy change caused a 14.3 percentage point increase in Medicaid coverage.
e. The t-statistic is 1.3/3.0 = 4.77, which is statistically significant at conventional levels. It is economically significant as well, since it represents almost a doubling ofthe level of coverage compared to poor children not eligible for the expansion.
5. a. β0 is the enrollment rate of individuals without HIV, which is 80 percent. β1 = -.08 tells us that individuals with HIV are 8 percentage points less likely to enroll into school.
b. No, as discussed on the practice problems for midterm 2, this estimate cannot be interpreted as causal (see question 5 on the practice problems for the complete explanation).
c. The variable Z, an indicator for whether or not an individual participated in an HIV prevention program, can be used to estimate a two stage least squares regression for the effects ofHIV on school enrollment. The equations one would estimate are:
1st stage: HIVi = π0 + π1Zi + vi
Calculate the predicted values
2nd stage: Schooli = β0 + β1 + ui
The two assumptions one would need to make about Z are the 2 instrument validity assumptions:
(1) Instrument relevance: Corr(Programi, HIVi) ≠ 0 and
(2) Instrument exogeneity: Corr(Programi, ui) = 0 The first condition is likely to be true if the HIV education program was effective in teaching individuals how to avoid HIV infection. This condition can be tested using a first-stage F-test.
The second condition is satisfied by the randomized assignment to the program: there are no characteristics of individuals (such as poverty) that are correlated with being in the program that also affect enrollment in school, as long as the assignment is truly random. The instrument exogeneity condition cannot be tested in this case because we have exact identification (the number of instruments is equal to the number of endogenous variables) rather than overidentification.
2021-12-05