闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

PLSC 30600: Final

2022

Problem 1 (20 points)

In “New Evidence on the Impact of Sustained Exposure to Air Pollution on Life Expectancy from China’s Huai River Policy” (2017), Ebenstein et. al. build on the Huai River Policy RD design described in the earlier 2013 PNAS paper to evaluate the eﬀect of air polution on life expectancy. While the life expectancy data is not publicly available, this problem will have you examine their RD estimates on air quality as measured by PM10 concentration (Particulate Matter with diameter less than 10 micrometers).

The original source for this data is

Ebenstein, A., Fan, M., Greenstone, M., He, G., & Zhou, M. (2017). New evidence on the impact of sustained exposure to air pollution on life expectancy from China’s Huai River Policy.

Proceedings of the National Academy of Sciences, 114(39), 10384-10389.

The code below will read in the dataset

river <- haven ::read_dta("DSP_PM10.dta")

river <- river %>% filter(!is.na (pm10)) # Drop missing PM10 stations

Each row in the data contains a measurement of air quality in a particular geographic location along with the distance in degrees of latitude from the Huai River boundary.

The relevant variables of interest are

● pm10 - Air quality as measured by PM10 concentration (micrograms per cubic meter µg/m3 )

● dist_huai - Degrees of latitude north of the Huai river boundary

● north_huai - An indicator for whether the location is north of the Huai river boundary (dist_huai > 0)

Part A (5 points)

Estimate the local average eﬀect of the Huai River policy on PM10 concentration at the river boundary using a local quadratic regression with a triangular kernel. Use a bandwidth of h = 8 to the left and right of the cut-point. Provide a 95% conﬁdence interval and discuss your results.

Part B (5 points)

Overlay your regression estimates from Part A onto a binned scatterplot and discuss how well you think the local quadratic regression approximates the conditional expectation function.

Part C (5 points)

Now estimate the eﬀect at the cut-point using a bandwidth of h = 16. Provide a 95% conﬁdence interval and interpret your result. Overlay the regressions onto a binned scatterplot as in Part B. Compare your results to what you ﬁnd in Part A and discuss the possible reasons for any diﬀerences or similarities that you observe.

Part D (5 points)

Assess whether there is bunching near the discontinuity. Use any appropriate analytical technique or techniques and interpret your results. Is there evidence to suggest that the density of the running variable in this dataset is discontinuous at the cut-point?

Problem 2 (20 points)

Does having a daughter (as opposed to a son) aﬀect how U.S. legislators vote on women’s issues? Washington (2008; American Economic Review) ﬁnds that having a daughter causes a legislator to vote more liberally, especially on issues related to women. You will examine this using the washington.dta dataset. While the

original paper looks at the 105th - 108th Congresses, this dataset will focus on representatives in the 105th (1997-1999).

The original source for this paper is

Washington, E. L. (2008). Female socialization: how daughters aﬀect their legislator fathers. American Economic Review, 98(1), 311-32.

The code below will load the data

washington <- haven ::read_dta("washington.dta")

The variables you will need are:

● aauw - Outcome variable - Legislator’s voting score as assigned by the American Association of University Women (AAUW) (proxy for feminist/liberal-leaning voting record). Positive values indicate more liberal/feminist voting behavior.

● ngirls - Number of female children

● nboys - Number of male children

● totchi - Total number of children

Part A (5 points)

Our treatment of interest is a multi-valued treatment - the number of female children of a legislator is a count variable ranging from 0 to 7. While we could estimate the eﬀects for each possible comparison (e.g. the eﬀect of having 5 girls vs. 2 girls or 3 girls vs. 0 girls or any girls vs. 0 girls), this could yield very high-variance estimates. Instead, we would like to pool our eﬀect estimates into a single summary estimate of the Average Treatment Eﬀect of having one additional daughter on the legislator’s AAUW score.

Let’s deﬁne a set of potential outcomes Y- (d) for all possible values of a treatment d e D. We again assume consistency: that for a unit with treatment level D- = d, the observed outcome Y- equals the potential outcome Y- (d).

Assume that the potential outcomes take on the following form

Y- (d) = Y- (0) + τ- d

What is the average treatment eﬀect of having 3 daughters versus having 1 daughter on a legislator’s AAUW score? How about the average treatment eﬀect of having 5 daughters versus 2 daughters? What assumption

are we making about the treatment eﬀects by writing the potential outcomes this way? How do we interpret

E [τ- ]?

Part B (5 points)

Estimate the average treatment eﬀect of having one additional child on AAUW score assuming that the number of female children is completely ignorable. Provide a 95% conﬁdence interval and interpret your results.

Part C (5 points)

Assume instead that the number of female children is aolcd{dol_ii它 ignorable given the number of total children.

Subset the sample to representatives with at least 1 child (of any sex) and no more than 5 total children (as there are very few representatives with 6+ children). We’ll be working with this sample for the rest of the problem including in Part D.

Estimate the conditional average treatment eﬀects of having one additional child on AAUW score conditional on the total number of children. Provide a 95% conﬁdence interval for each CATE and interpret/discuss your results.

Part D (5 points)

Without making any additional assumptions on the outcome model, estimate the average treatment eﬀect of having one additional child on AAUW score under the assumption of conditional ignorability. Provide a 95% conﬁdence interval and interpret/discuss your results.

Problem 3 (10 points)

Consider a setting with N observations indexed by i = {1, 2, . . . , N }, a binary treatment D- , an outcome Y- , and pre-treatment covariates X- . Assume consistency/SUTVA (Y- = D- Y- (1) + (1 _ D- )Y- (0)), positivity 0 < Pr (D- = 1、X- ) < 1 and conditional ignorability {Y- (1), Y- (0)}llD- 、X- .

Let π(X- ) = Pr (D- = 1、X- ) denote the true propensity score function.

Let µ 1 (X- ) = E [Y- (1)、X- ] and µ0 (X- ) = E [Y- (0)、X- ] denote the true regression functions for the potential outcome under treatment and control respectively.

Our estimand is the ATE

τ = E [Y- (1)] _ E [Y- (0)]

As a general hint for this problem, you will ﬁnd the law of total expectation very useful

E [Y-] = E [E [Y- 、X-]]

Part A (2 points)

Consider the IPTW estimator

1 i D- Y- (1 _ D- )Y-

≥pw = N -=1 (X- ) _ 1 _ (X- )

Show that if we know the true propensity score ((X- ) = π(X- )), ≥pw is unbiased for the ATE.

Part B (4 points)

Consider another estimator

= -1 _1 (X- ) + 、_ _0 (X- ) + 、

For this problem, treat 1 (X- ) and 0 (X- ) as constants given X- .

Show that if we know the true propensity score ((X- ) = π(X- )) will be unbiased for the ATE even if 1 (X- ) µ 1 (X- ) and 0 (X- ) µ0 (X- )

Part C (4 points)

Show that if we know the true regression models 1 (X- ) = µ1 (X- ) and 0 (X- ) = µ0 (X- ), is unbiased for the ATE even if we misspecify the propensity score (for this part, treat (X- ) as a constant but (X- ) π(X- ))