Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Project II

BUSN 5000

Problem 1 (15 points)

Consider the cross-section regression model,

yi  = β0 + δDi + β1 xi1 + ui ,    i = 1, . . . N,                                                 (1)

where yi is some outcome of interest, Di is a binary treatment indicator for unit i and xi1 is some unit-specific characteristic that is being controlled in the model.

(a) What is the key identifying assumption for interpreting the OLS estimator of δ as a causal effect?     (b) Write down the formula for the OLS estimator of δ . Your answer should reflect the application of the

FWL theorem.

(c) Write down the formula for the test statistic to test H0  : δ = 0. How is this test statistic justified? By default, does lm produce the correct version of the test statistic? Why or why not?

Problem 2 (20 points)

Figure 1 reproduces the Project STAR class-size effects estimates from assignment 2. Recall that the depen- dent variable is a student test score. Use the table (in the “figure”) to answer the questions below.

(a) Write a sentence that interprets the estimated effects of class size and teacher aides on test scores in

Column (1). Are the estimated effects statistically significant at the 5% level? Are they causal?

(b) Write a formal expression of the regression estimated in Column (2). Does adding teacher experience

affect your answer in part (a)? Is the estimated effect of teacher experience statistically significant at the 5% level? Write a sentence comparing the value of a move to a small class with having a teacher with 10 years experience.

(c) What is the difference between the specifications in Columns (2) and (3)? How does the specification in Column (3) affect the estimated class-size and teacher-experience effects?

(d) Column (4) adds controls for student gender, race and eligibility for free lunch.  How much do they add to the overall explanatory power of the regression? Are the estimated class-size effects robust to their inclusion?

Problem 3 (15 points)

(a)  Briefly describe the intuition behind regression discontinuity (RD) research design for estimating the

effect of drinking on mortality in assignment 3. What is the key identifying assumption for the results from such a design to have a causal interpretation?

(b) Figure 3 reproduces the regression discontinuity  (RD) plots for mva,  suicide  and homicide  from

assignment 3. What is the main message of the plots?

(c) Figure 3 reproduces the estimated effects of MLDA laws on mva, suicide and homicide from as- signment 3.  Write a sentence that interprets the RD estimates for motor-vehicle accidents.  Repeat this exercise for suicides and homicides.  Write a sentence indicating which results are statistically significant at the 5% level.

Problem 4 (15 points)

(a)  Briefly describe the intuition behind difference-in-differences (DD) research design for estimating the

effect of worker’s compensation on injury duration in assignment 4.   What is the key identifying assumption for the results from such a design to have a causal interpretation?

(b) Write down the population regression model corresponding to Column (1), explicitly defining each variable.

(c) Figure 3 reproduces the difference-in-differences (DD) estimates of the effect of worker’s compensation on injury duration from assignment 4. Write a sentence that interprets the simple DD estimate of the effect of the WBA increase in KY. How does adding covariates to the model affect the DD estimate? How does the simple DD evidence from MI compare with the results from KY?

Problem 5 (15 points)

(a) What is the rationale for regularizing”OLS regression for prediction purposes? What does lasso stand for and what does the lasso penalty do?

(b) Figure 5 reproduces the lasso CV (M) plot from assignment 5. What do the red dots represent? Give the steps in the cross-validation process behind their calculation.

(c) Explain how to use the results depicted in Figure 5 to choose the best model for out-of-sample predic- tion.

Problem 6 (20 points)

In this final problem you will replicate some of the analysis in B.3 of assignment 1 using a different sample

from the NLSYM. The data come from Blackburn and Neumark (1992) (hereafter, BN) and are available in the wage2 dataset of the wooldridge package.  BN’s sample is based on the 1980 survey year, but it is otherwise similar to the Card (1995) sample you used in assignment 1.  You will find a description of the

variables in the referenced dataset through the Help tab in the Plot pane of RStudio. You will also find a description in their paper.

Unlike in the homework assignments, you will be a little more“off the chain” here. First, you will have to fill out the code chunk on your own. Don’t fret though, because you have everything you need in assignment 1. Plus, there is TAL if you get really stuck. Second, the analysis write-up is less scripted by the instructions, so you will have to string the relevant sentences together on your own.

(a) Begin by constructing a table of summary statistics for the main model variables (wage, educ, exper, black, south, urban) that reports the mean, standard deviation, min and max. Write a short para- graph describing the sample based on the table you constructed.

(b) Estimate the return to schooling controlling for exper and its square, black, south, and urban. Then, as in assignment 1, address the concern that the estimated education coefficient is biased because the model does not control for unobserved ability by adding IQ as a proxy. The sample also provides each young man’s Knowledge of the World of Work (KWW) score. In a third and final regression, add KWW as an additional proxy for unobserved ability.

Present your results in a proper table using modelsummary report standard errors that are robust to heteroscedasticity.  Write a short paragraph interpreting the returns-to-schooling estimates from this analysis, being sure to indicated whether they are statistically significant.  An obvious approach to this paragraph would be to start with a sentence about the finding in Column (1) and then proceed to columns (2) and (3) highlighting how the results change.


#  Construct  table  of  summary  statistics

# Estimate  regression  models

#  Create  coefficient  map  with  variable  labels .

#  Create  good- of-fit  map .

# Estimate  the  models  and  construct  a  table  of  results .

 

Figure 1: Class-size effect estimates, kindergartners

 

Figure 2: Effects of MLDA laws on MVA deaths, suicides and homicides

 

Figure 3: Effects of worker’s comp on injury duration

 

Figure 4: Lasso CV(M) plot for house-price prediction