关键词 > EMET3006/4301/8001
Applied Micro econometrics, EMET3006/4301/8001 Semester 2, 2022 Tutorial 4
发布时间:2022-08-30
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Applied Micro econometrics, EMET3006/4301/8001
Semester 2, 2022
Tutorial 4 (Week 5)
Write a program that you can use to replicate the figures and tables in Lee, Moretti and Butler (2004). Use my lecture slides to help you write the code.
1. If you haven’t already then save the lmb data.dta data set. (You down- load the data from Wattle)
2. You will need to install a few programs in order to do the full rdd analysis: Type:
gnst_ll .p_bk_eds(mdstgm_trm)
gnst_ll .p_bk_eds(mrccm)
gnst_ll .p_bk_eds(mrcroaustm)
gnst_ll .p_bk_eds(mrccdnsgtⅩm)
gnst_ll .p_bk_eds(mst_tsm)
3. If you haven’t already you will need to install the McCrary density test. Instructions for this are in tutorial 3.
4. We begin by replicating the first three columns of table 1 in LMB. Read the footnotes carefully so that you get the correct sample size.
(a) Column 1 is the estimate of 礻 . Recall, it is the voting record in time t + 1 given a democrat won in time t minus the voting record in time t + 1 given a republican won in time t. It is therefore the difference in voting records between democrats and republicans in the second session after they won their elections.
(b) Write down the regression equation you need to estimate this.
(c) Run the regression in R, cluster the standard errors on gc2.
(d) Estimate T1 and (Pt+1 ]Dt = 1 _ Pt+1 ]Dt = 0), columns 2 and 3 of Table 1
(e) Why are there only 915 observations when the full data set has 13
588 observations? Is it a good idea to limit the regression to only those 915 observations?
(f) R tips: The “stargazer” package doesn’t work with the “estimatr” package. The “estimatr” package is useful for clustering and calcu- lating robust standard errors. So we need a new way of presenting
our regression results. Install the “cli” package. Then after you have run your three regressions you type:
blg::blg tdxt(morgegn_l rdsults a_sdc on ADA sbords -- closd Bldbtgons s_mpldm)
tdxrde::sbrddnrde(lgst(lm 1, lm 2, lm 3), tⅩpd=mtdxtm)
5. Repeat the regressions for columns 1-3 but this time include all the data. What is the effect on the coefficient? What is the effect on the standard error? What is the effect on the confidence interval?
6. Center vote share at 0.5, that is create a new running variable where the cutoff is at zero.
7. Rerun the regressions but include the centred running variable. Use whichever band width you prefer.
8. We want to allow for different slopes either side of the cutoff, not just different intercepts. To do that we need to interact treatment with the running variable.
9. Compare and contrast the regression results when you don’t include the running variable, when you do and when you allow the slopes to differ.
10. We next experiment with bandwidths. Run three different regressions allowing for different slopes with a band width of:
(a) +/-0.1 from the cutoff
(b) +/-0.05
(c) +/-0.01
Discuss your results.
11. We now include all the data but construct polynomial terms. Generate polynomial terms up to order 5 of the running variable. Do not centre them.
12. Run a regression of score on treatment and include up to 5 polynomial terms.
13. Now centre all the polynomial terms, that is you regenerate the poly- nomial terms but this time using the centred running variable. Repeat the regressions.
Of course, we haven’t even plotted the data yet.
14. First just look at the raw data. What can you say about this graph?
15. Compare this figure with Figure I from lmb. What’s different?
16. Add a linear trend
17. Include a lowess smoothed line.
18. Replicate figure IIA from lmb but this time aggregate the data.
19. And again for (Pt+1 ]Dt = 1 _ Pt+1 ]Dt = 0)
20. Compare the difference in the vertical distance at the cutoff for the linear fit model and the quadratic fit model. Notice how the linear model is heavily influenced by the outliers that pull down the slope at higher values of demvoteshare. As this is causing the regression slope coefficient to “pivot”, notice how it causes the highest point on the slope at the top left to rise. This is the sort of “bias” problem that we contend with in RDD.
21. Hahn, Todd and Van der Klaauw (2001) showed that the one-sided Kernel estimation (like LOWESS) may have poor properties because the point of interest is at a boundary (i.e., the cutoff). This is the “boundary problem” . They proposed instead to use a “local linear nonparametric regression”
Plot a kernel weighted regression with bandwidth 0.1 and using a “box” kernel (see my code in the slides). What do you notice?
Note that local regression is a smoothing method.
22. Repeat for the second and third estimations. Play around with the bandwidth. Play around with the kernel type: try using “normal” .
23. Next, get the treatment effect at the cutoff where demvoteshare=0.5. I
just looked at the data around where α = 0.5: smooth dem1-smooth dem2=65.62_
20.56 = 45.06
24. Our last task is to allow R to choose the optimal bandwidth and to run a non-parametric estimation with that optimal bandwidth.
We haven’t yet actually checked for sorting on the running variable
25. Run a McCrary density test using the command from tutorial 3 but replacing the variables with the ones in lmb.