Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

PPOL 561

Problem Set

Regression Discontinuity Analysis

Due by 5pm EST on Monday April 17, 2023

Please submit your problem set answers (pdf, doc) and R code (Rmarkdown) via Canvas.

You have been hired as a poverty specialist for the RAND Corporation. For your first project, you have been asked to analyze the impacts of the Poverty Assistance Program in the country of Wakanda. This program was tested in 2019. In 2019, the program provided cash benefits (“Poverty Assistance Benefits”) to households with incomes below the federal poverty limit, which varies based on household size according to the following schedule:

Household Size

Federal Poverty Limit

1

$12490

2

$16910

3

$21330

4+

$25750

For each household that qualifies for benefits, the cash benefit amount if 20% of the Federal Poverty Limit. For households with incomes above the Federal Poverty Limit, there are no cash benefits.

The goal of this analysis is to estimate the impacts of benefits from 2019 on employment in 2020.

1. Explain how this benefit schedule creates the opportunity to apply the Regression Discontinuity research design to study the impacts of cash benefits from 2019 on employment in 2020. What are the intuitions behind the identifying assumptions in this context?

Download rd_problem_set.csv. This dataset has one observation per household and the following variables:

female: indicator for the gender of the person interviewed in the household

age: age of the person interviewed in the household

college: indicator for the college attendance for the person interviewed in the household

nhhld: number of people in the household

inc2019: household income in 2019

pab2019: amount of poverty assistance benefit

emp2020: indicator for employment in 2020 for person interviewed in household

Use this dataset to answer the following questions:

2. Preliminaries:

A. Create a variable “fpl” that has the federal poverty limit for each household.

B. Use this variable to create the running variable “runvar” which captures income relative to the household-specific federal poverty limit.

C. Create a binned version of the running variable (runvarbin) that rounds the values of the running varible to the nearest $100.

D. Create an indicator D equal to 1 for income above the federal poverty limit (given the household’s size) and 0 otherwise.

E. Create an indicator T equal to 1 if poverty assistance benefits are positive and 0 otherwise.

F. Unless otherwise stated, use a bandwidth of +/- $5000 around the fpl for the analysis.

 

3. Sharp RD First Stage regressions and plots:

Estimate the following regressions

 

 

Cluster the standard errors based on the binned running variable.

Based on the estimates, how do treatment status and average benefit amounts change for households above and below the 2019 federal poverty limit?

Calculate the means of T and pab2019 within each bin of the binned running variable, and then create 2 first stage plots by plotting each of these outcomes (y-axis) against values of the binned running variable (x-axis). Are these graphs consistent with your regression results?

4. Reduced form regressions and plots:

Estimate the following regressions

 

Cluster the standard errors based on the binned running variable.

Based on the estimates, how do 2020 employment outcomes change for households above and below the 2019 federal poverty limit?

Calculate the means of emp2020 within each bin of the binned running variable, and then create a reduced form plot by plotting these means (y-axis) against values of the binned running variable (x-axis). Is this graph consistent with your regression results? Interpret the regression and graph in terms of impacts of poverty assistance benefits on the outcomes.

Interpret the coefficient estimates. Using the ratio of the first stage and reduced form estimates, how much does an additional $1000 of benefits impact the probability of employment?

5. Frequencies plot:

Calculate the counts of the number of observations within each bin of the running variable (Nobs). Using one observation per bin value, estimate the following regression

where b indexes each bin and g(.) is a cubic polynomial of the binned running variable value. Is the coefficient on the indicator variable D significant?

Plot Nobs and the fitted values from this regression. If households could manipulate the running variable to qualify for treatment, what would you expect to see? Is there any evidence that households are able to manipulate the running variable to qualify for treatment?

6. Covariate predicted employment

Regress employment in 2020 on a cubic polynomial in age, female, college, dummies for household size, and a cubic polynomial in 2019 household income. Obtain the predicted values and use these predicted values to estimate the same regression as in (4). How do these results compare to the result in (4)? How do these results relate to the RD identifying assumptions and the interpretation of your results from (4)?

7. Sensitivity analysis: polynomial specification

So far, we have assumed a linear polynomial specification of the running variable. How do the results change if you use quadratic or cubic polynomial specifications for the running variables?

8. Sensitivity analysis: bandwidth

So far, we have used a bandwidth of +/- $5000 around the household-specific federal poverty limit. Vary the bandwidth from +/- $400 to +/- $10000 and plot the estimates and standard errors. How do the estimates vary as the bandwidth increases? What is the minimum bandwidth for which the estimates look stable? What is the minimum bandwidth for which the estimate is statistically significant (different from 0)?

9. Sensitivity analysis: permutation test

One of your colleagues at RAND points out that there may be some special features about the income values that are highlighted by the federal poverty limits. To address this, you implement the following permutation test. You randomly draw household size (1 through 4), assign the federal poverty limit given the above schedule, and then re-run your analysis based on household income relative to the randomly assigned federal poverty limit. You run 500 iterations and compare your estimate based on the actual data to the permutation estimates. Show these results. How do these results address your colleague’s concerns?