关键词 > ECOM30002/90002

ECOM30002/90002 ECONOMETRICS 2 SEMESTER 2, 2022 ASSIGNMENT 3

发布时间：2022-09-09

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ECOM30002/90002 ECONOMETRICS 2

SEMESTER 2, 2022

ASSIGNMENT 3

Section 1

In this section, you will carry out a simulation experiment in R to investigate the statistical properties of OLS and IV in a causal model with heterogeneous eﬀects. Report any numbers to the third decimal place.

Important

– Fix the random seed in R as rseed(2022) so that your results are replicable.

– You must append a complete copy of the R script that you have used to generate your results to your solutions to this assignment.

– The appended R script needs to replicate all your simulation results in this section. I.e., if your appended R script is copy-pasted into R Studio and run, all your result in this section

need to be produced and printed without the person who replicates it needing to manually change anything.

Consider the following data generating process (DGP). There is a binary treatment which deﬁnes two potential outcomes for every unit i: Y1i and Y0i . The potential outcomes follow the bivariate normal distribution

(Y1i, Y0i)\ I N ╱╱ \0(1) , ╱2σg2 \\ ,

with σ e {1, ). There is also a simulated ﬂipping of a coin. Let

Zi =

where Ri is a uniform random variable, Ri I Uniform(0, 1). The resulting random variable Zi represents the ﬂip of a coin. The probability P (Zi = 1) is 0.5.

You study three ways in which treatment can be assigned to units:

A. Units are assigned to treatment by ﬂipping the (simulated) coin:

XAi = Zi

where XAi is a binary indicator of having received the treatment.

B. Units with a positive treatment eﬀect receive the treatment: XBi =

where βi denotes the individual treatment eﬀect, βi = Y1i . Y0i .

C. A combination of A and B: If Zi = 0, then XCi = 0. If Zi = 1, then

XCi =

Each of the above three ways A–C of assigning treatment gives rise to a particular observed dependent variable: YAi = Y0i + βiXAi; YBi = Y0i + βiXBi; and YCi = Y0i + βiXCi .

1. Derive the distribution of the individual treatment eﬀects, βi . If the distribution depends on σ, give the distribution for each of σ = 1 and σ = 3/2.

Remember the following property of normal random variables: Let W ~ N (µw, σw(2)) and V ~ N (µv, σv(2)), then

W + V ~ N (µw + µv, σw(2) + σv(2) + 2σwv), where σwv = Cov(W, V).

2. What is the average treatment eﬀect, E(βi) in this DGP? What are the average treatment eﬀects on the treated (ATT) for the treatments A and B? That is, what are E(βi|XAi = 1), and E(βi|XBi = 1)? If any of the treatment eﬀects depends on σ, give the values for both σ = 1 and σ = 3/2.

To ﬁnd the ATT for B use the following property of normal variables. Let V ~ N (µv, σv(2)), then

E(V |V > c) = µv + σvλ(), where λ(·) = and φ(·) and Φ(·) represent the normal pdf and cdf, respectively (in R: dnorm() and pnorm()).

3. Draw one random sample of size n = 100, 000 from the DGP with σ = 3/2 and empirically calculate E(βi), E(βi|XAi = 1), E(βi|XBi = 1) and E(βi|XCi = 1) in the simulated data. Run the three simple OLS regressions of YKi on XKi where K e {A, B, C) and report the slope coeﬃcients. In which case or cases do the coeﬃcients identify the respective ATTs? Explain.

4. Now you will conduct a simulation experiment based on repeated samples where we focus only on the DGP with the third way of assigning treatment, C. For each of the four combinations of σ = {1, 3/2) and sample size n = {50, 500), draw 1,000 repeated samples (aka replications) and run both an OLS and an IV regression of YCi on XCi . For the IV regressions, use Zi as an instrument for XCi . Present a table with the mean and standard deviation of the estimated OLS and IV slope coeﬃcients.

5. For each of the four simulated DGPs, present a ﬁgure with the estimated densities of βˆ1(oLs) and βˆ1(Iv) . (Note: See Tutorial 7 for ﬁgures of estimated densities.)

6. In each of the replications from part 4 also perform two tests (at the 95% signiﬁcance level) involving the IV estimate of the slope coeﬃcient. Tqhe ﬁrst test is of the null hypothesis H0 : β1(Iv) = E(βi|βi > 0), and the second one of H0 : β1(Iv) = E(βi) (remember that you calculated the values of these causal eﬀects in part 1). Present a table with the rejection frequencies for each of the tests in each of the four simulated DGPs.

7. Discuss your ﬁndings from part 4 and 5. In which ways are the OLS and IV results similar or diﬀerent? How do the results you present relate to the theoretical properties of the estimators?

8. Discuss your ﬁndings from part 6. What does the rejection frequency represent in each of the two tests? How do the features of the DGPs (σ, n) inﬂuence the tests?

Tips for coding the simulation in R

● The simulation (from part 4 on) is similar to the ones you have discussed in the tutorials. You can use those simulations as a guide on how to set up and structure your simulation. In particular, you can write your simulation around two nested for() loops:

– the outer loop just needs to loop over the four values of (σ, n). You can also write this as two loops, one over σ and one over n

– the inner loop needs to loop over the 1,000 repeated samples, in each of which you draw the n observations of all the Y1i, Y0i, Ri variables variables, generate the corresponding Zi, XCi , and YCi variables, run the OLS and IV regressions, perform the hypothesis tests, and store the slope coeﬃcients and the decisions from the tests (reject or not reject) in appropriate, previously created, storage matrices. (For the tests, instead of the decisions, you can alternatively store the test statistics or p-values in the storage matrices and apply the decision (reject/not reject) to the elements of those matrices.)

● General coding tips:

– Try not to write a for() loop in one go. Make sure that the code works for a single sample, then slowly generalise it to a for() loop.

– When testing for() loops, you can use a small number of replications so that the code runs faster. Just remember to change it back when you are sure it works!

– Leave comments on your code regularly.

Section 2

JD Angrist and WN Evans (1998, “Children and their parents’ labor supply: Evidence from exogenous variation in family size”, Quarterly Journal of Economics, 88(3), 450–477) used the sex composition of a family’s ﬁrst two children as an instrumental variable to estimate the causal eﬀect of having a third child on the mothers’ labour supply using a random sample from US census data. In this section, you will use a subset of their data to estimate models similar to theirs. This sample, included in the ﬁle pums80 .csv, consists of observations measured in the year 1980 on 254,654 married women. The variables from the data used in this section are

morekids	=1 if mom had more than 2 kids
boy1st	=1 if 1st kid was a boy
boy2nd	=1 if 2nd kid was a boy
samesex	=1 if 1st two kids same sex
agem1	age of mom at census
agefstm	moms age when she 1st gave birth
black	=1 if mom is black
hispan	=1 if mom is hispanic
othrace	=1 if mom is othrace
incomem	labor income per week, 1979, constant $

The causal model of interest is

incomemi = β0 + β1morekidsi + Xβ + Ui , (6)

where Xi contains agem1, agefstm, black, hispan and othrace.

Consider the (blank) table of estimation results below and answer Questions 9– 15. For ques- tions requiring numerical answers: give answers to one decimal place for estimated coeﬃcients, standard errors and test statistics, and to three decimal places for p-values.

Table A: The eﬀect of family size on women’s income

Dependent variable : incomem

OLS 2SLS

(1) (2) (3) (4)

morekids

Exogenous covariates

Instruments

samesex

twoboys, twogirls

twoboys, twogirls and interactions of these with Xi (12 instruments in total)

√

Tests

IV relevance, test statistic

IV relevance, p-value

OIR, test statistic

OIR, p-value

Hausman, test statistic

Hausman, p-value

Number of observations 254,654 254,654 254,654 254,654

Note: Cells in ﬁrst panel show estimated coeﬃcients for morekids and robust standard errors (in parentheses). Exogenous covariates (xi ): dummy variables for black, hispanic and other

race (excluded: dummy for white); mother’s age at ﬁrst birth; mother’s age.

9. Explain the rationale for assuming that samesex is a valid instrument, and discuss if you think this rationale is plausible.

Generate the dummy variables twoboys and twogirls, which indicate that samesex=1 and the siblings are boys or girls, respectively. Does the rationale for instrument validity extend to these variables?

10. Run the regressions indicated in Table A and ﬁll in the ﬁrst panel of the table. Interpret the estimated coeﬃcients on morekids from Columns (1) and (2).

11. Test for instrument relevance and ﬁll in the corresponding cells in the table. Interpret the results of the tests.

12. The number of IVs increases from Column (2) to (3) and from (3) to (4). Assuming the instruments used in Column (3) are valid, are the instruments used in (4) valid? What is an advantage of having more instruments? Can you see evidence of this advantage in the results? (Hint: You can read the section “Application to the Demand for Cigarettes” in Chapter 12.2 “The General IV Regression Model” of Stock and Watson (2015), to learn about the advantage of having more instruments.)

13. Carry out OIR tests for the appropriate columns in Table A and ﬁll in the results in the appropriate rows. State the null hypothesis of this test in words, give a concise explanation of what the test consists of, and interpret the results of the tests.

14. Read about weak instruments in the section “Assumption 1: Instrument Relevance” in Chap- ter “12.3 Checking Instrument Validity” of Stock and Watson (2015). Discuss the results from the tests for instrument relevance from part 11 in light of what you read about weak instru- ments. (Remember that in large samples, χ2 = d1 F where d1 denotes the F distribution’s denominator degrees of freedom.)

15. Test for exogeneity of morekids using a Hausman test and ﬁll in the corresponding cells in the table. Interpret the results of the tests.