关键词 > EMET3006/4301/8001

Applied Micro econometrics, EMET3006/4301/8001 Semester 2, 2022 Tutorial 4

发布时间：2022-08-30

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Applied Micro econometrics, EMET3006/4301/8001

Semester 2, 2022

Tutorial 4 (Week 5)

Write a program that you can use to replicate the ﬁgures and tables in Lee, Moretti and Butler (2004). Use my lecture slides to help you write the code.

1. If you haven’t already then save the lmb data.dta data set. (You down- load the data from Wattle)

2. You will need to install a few programs in order to do the full rdd analysis: Type:

gnst_ll .p_bk_eds(mdstgm_trm)

gnst_ll .p_bk_eds(mrccm)

gnst_ll .p_bk_eds(mrcroaustm)

gnst_ll .p_bk_eds(mrccdnsgtⅩm)

gnst_ll .p_bk_eds(mst_tsm)

3. If you haven’t already you will need to install the McCrary density test. Instructions for this are in tutorial 3.

4. We begin by replicating the ﬁrst three columns of table 1 in LMB. Read the footnotes carefully so that you get the correct sample size.

(a) Column 1 is the estimate of 礻 . Recall, it is the voting record in time t + 1 given a democrat won in time t minus the voting record in time t + 1 given a republican won in time t. It is therefore the diﬀerence in voting records between democrats and republicans in the second session after they won their elections.

(b) Write down the regression equation you need to estimate this.

(d) Estimate T1 and (Pt＋1 ]Dt = 1 _ Pt＋1 ]Dt = 0), columns 2 and 3 of Table 1

(e) Why are there only 915 observations when the full data set has 13

588 observations? Is it a good idea to limit the regression to only those 915 observations?

(f) R tips: The “stargazer” package doesn’t work with the “estimatr” package. The “estimatr” package is useful for clustering and calcu- lating robust standard errors. So we need a new way of presenting

our regression results. Install the “cli” package. Then after you have run your three regressions you type:

blg::blg tdxt(morgegn_l rdsults a_sdc on ADA sbords -- closd Bldbtgons s_mpldm)

tdxrde::sbrddnrde(lgst(lm 1, lm 2, lm 3), tⅩpd=mtdxtm)

5. Repeat the regressions for columns 1-3 but this time include all the data. What is the eﬀect on the coeﬃcient? What is the eﬀect on the standard error? What is the eﬀect on the conﬁdence interval?

6. Center vote share at 0.5, that is create a new running variable where the cutoﬀ is at zero.

7. Rerun the regressions but include the centred running variable. Use whichever band width you prefer.

8. We want to allow for diﬀerent slopes either side of the cutoﬀ, not just diﬀerent intercepts. To do that we need to interact treatment with the running variable.

9. Compare and contrast the regression results when you don’t include the running variable, when you do and when you allow the slopes to diﬀer.

10. We next experiment with bandwidths. Run three diﬀerent regressions allowing for diﬀerent slopes with a band width of:

(a) +/-0.1 from the cutoﬀ

(b) +/-0.05

Discuss your results.

11. We now include all the data but construct polynomial terms. Generate polynomial terms up to order 5 of the running variable. Do not centre them.

12. Run a regression of score on treatment and include up to 5 polynomial terms.

13. Now centre all the polynomial terms, that is you regenerate the poly- nomial terms but this time using the centred running variable. Repeat the regressions.

Of course, we haven’t even plotted the data yet.

14. First just look at the raw data. What can you say about this graph?

15. Compare this ﬁgure with Figure I from lmb. What’s diﬀerent?

16. Add a linear trend

17. Include a lowess smoothed line.

18. Replicate ﬁgure IIA from lmb but this time aggregate the data.

19. And again for (Pt＋1 ]Dt = 1 _ Pt＋1 ]Dt = 0)

20. Compare the diﬀerence in the vertical distance at the cutoﬀ for the linear ﬁt model and the quadratic ﬁt model. Notice how the linear model is heavily inﬂuenced by the outliers that pull down the slope at higher values of demvoteshare. As this is causing the regression slope coeﬃcient to “pivot”, notice how it causes the highest point on the slope at the top left to rise. This is the sort of “bias” problem that we contend with in RDD.

21. Hahn, Todd and Van der Klaauw (2001) showed that the one-sided Kernel estimation (like LOWESS) may have poor properties because the point of interest is at a boundary (i.e., the cutoﬀ). This is the “boundary problem” . They proposed instead to use a “local linear nonparametric regression”

Plot a kernel weighted regression with bandwidth 0.1 and using a “box” kernel (see my code in the slides). What do you notice?

Note that local regression is a smoothing method.

22. Repeat for the second and third estimations. Play around with the bandwidth. Play around with the kernel type: try using “normal” .

23. Next, get the treatment eﬀect at the cutoﬀ where demvoteshare=0.5. I

just looked at the data around where α = 0．5: smooth dem1-smooth dem2=65．62_

20．56 = 45．06

24. Our last task is to allow R to choose the optimal bandwidth and to run a non-parametric estimation with that optimal bandwidth.

We haven’t yet actually checked for sorting on the running variable

25. Run a McCrary density test using the command from tutorial 3 but replacing the variables with the ones in lmb.