ECON7360 Causal Inference for Microeconometrics Problem Set 1
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
ECON7360 Causal Inference for Microeconometrics
Problem Set 1
Instruction
. When you are asked to explain or discuss something, your response should be concise (no more than four sentences). Please clearly label all your answers.
. Use STATA to conduct the empirical analysis and include your do file as an “Appendix” at the end of your report.
. You should upload your work as a PDF file via the “Turnitin” submission link (in the “Problem Set 1” folder under “Assessment”) by 4 pm on the due date September 8, 2023.
. You are allowed to work on this assignment in groups; that is, you can discuss how to answer these questions with your group members. However, this is not a group assignment, which means that you must answer all the questions in your own words and submit your report separately. The marking system will check the similarity, and UQ’s student integrity and misconduct policies on plagiarism apply.
. The maximum possible mark allocated for this problem set is 112. Its contribution towards your final grade will be = you112(r m)ark × 15, rounded up to 2 decimal points.
1 OLS
1.1 Effect of Class Size on Students’ Performance (15)
Consider a randomized experiment where students and teachers are randomly assigned to either a small class (15 students) and a regular class (24 students). We want to estimate the effect of smaller class in primary school and use the following linear model:
Score = β0 + β1ClassSize + Controls + u
where Score is student’s academic score, ClassSize is dummy for small class, and controls
includes free lunch status, race, gender, teacher characteristics and so on.
However, you estimate the following model instead:
Score = α0 + α1ClassSize + v
(i) Provide conditions for the OLS estimator for α1 to be unbiased. (3 marks)
(ii) Provide Gauss-Markov assumptions for the OLS estimator for α1. (3 marks)
(iii) Evaluate the sign and the magnitude of bias α1 if teacher’s experience has positive effect on score and more experienced teachers are more likely to be assigned to regular class. (3
marks)
(iv) Suppose that teachers and students are randomly assigned to either a small class (15 students) or a regular class. Compare α1 to β1. (3 marks)
(v) How does the OLS estimator for β1 change as we additionally include parental charac-teristics as Controls? (3 marks)
1.2 Effect of Education on Wages (12)
Assume A1-A5 in the lecture and true model is given by (1):
ln(wage) = β0 + β1 educ + β2 exper + β3 IQ + u (1)
OLS estimation results (standard errors in parentheses) are provided as follows:
ln(wage) = 5.20 + .06 educ + .02 exper+ .006 IQ
(.12) (.007) (.003) (.001)
where ln(wage) is dependent variable(y) and educ, exper, IQ are the independent variables
respectively.
The following covariance table among educ, exper, and IQ can be obtained from Stata:
corr educ exper IQ, covariance
(obs=935)
educ exper IQ
educ 4.83
exper -4.38 19.14
IQ 17.05 -14.81 226.58
However, you estimate the following model instead:
ln(wage) = b0 + b1 educ + e
(i) Provide the OLS estimator for b1. (3 marks)
(ii) Provide the bias of OLS estimator for b1. (3 marks)
(iii) What does the bias of OLS for b1 depend upon? (3 marks)
(iv) Can you determine the direction of the bias for b1 ? (3 marks)
2 Using IV to overcome problems in linear regression
2.1 Effect of Education on Wages (20)
We consider the following regression model:
ln(wage) = β0 + β1 educ + β2 exper + β3 IQ + u
Suppose IQ is unobserved. For endogenous educ, z, a dummy variable constructed using infor-mation on quarter of birth, is used as an instrument, where z is 0 if born in the 1st quarter and 1 otherwise. You are trying to use an instrument to get β1 .
(i) What is the rationale for using z as an instrument? State the conditions of z for β1 to be consistent.(5 marks)
(ii) What is the Wald estimator for β1 where we use z as an IV for educ? (5 marks)
(iii) Card (1995) instead uses near4 (distance from student’s home to nearest 4-year college) and near2 (distance from student’s home to nearest 2-year college) as IVs for educ. What is the rationale for using near4 (or near2) as instruments? State the conditions of near4 (or near2)
for β1 to be consistent. (5 marks)
(iv) Previous numerous studies reported that the IV estimate of β1 is greater than the OLS estimate of β1 . Provide some of potential reasons why the IV estimates are bigger than the OLS estimates. (5 marks)
2.2 Choice program and family income (12)
The following is asimple model to measure the effect of a school choice program on standardized
test performance [see Rouse (1998) for motivation]:
score = β0 + β1 choice + β2 faminc + u1
where score is the score on a statewide test, choice is a binary variable indicating whether a student attended a choice school in the last year, and faminc is family income. The IV for choice is grant, the dollar amount granted to students to use for tuition at choice schools. The grant amount differed by family income level, which is why we control for faminc in the equation.
(i) Even with faminc in the equation, why might choice be correlated with u1 ? (3 marks)
(ii) If within each income class, the grant amounts were assigned randomly, is grant uncorrelated
with u1 ? (3 marks)
(iii) Write the reduced form equation for score (standard reduced form equation). Explain why
this is useful. (Hint: How do you interpret the coefficient on grant?) (3 marks)
(iv) Write the reduced form equation for choice (replace the outcome variable with the variable choice in the standard reduced form equation). What is needed for grant to be partially
correlated with choice? (3 marks)
2.3 Gender Peer Effects (15)
Suppose you want to test whether girls who attend a girls’ high school do better in math than girls who attend coeducational (mixed gender) schools. You have a random sample of senior high school girls from a state in the United States, and score is the score on a standardized math test. Let girlhs be a dummy variable indicating whether a student attends a girls’ high
school.
(i) What would be the regression that would you run in Stata to express the relationship between score and the type of school that students attend (i.e.. if it is a girls’ high school or not) (3
marks)?
(ii) What other factors would you control for in the equation? (You should be able to reasonably
collect data on these factors.) (3 marks)
(iii) Write an equation relating score to girlhs and the otherfactors you listed in part (ii). (3 marks)
(iv) Suppose that parental support and motivation are unmeasured factors in the error term in
part (ii). Are these likely to be correlated with girlhs? Explain. (3 marks)
(v) Discuss the assumptions needed for the number of girls’ high schools within a twenty-mile
radius of a girl’s home to be a valid IV for girlhs. (3 marks)
2.4 Catholic School and College Attendance (18)
In an article, Evans and Schwab (1995) studied the effects of attending a Catholic high school on the probability of attending college. For concreteness, let college be a binary variable equal to unity if a student attends college, and zero otherwise. Let CathHS be a binary variable equal to one if the student attends a Catholic high school. A regression model is:
college = β0 + β1CathHS + otherfactors + ut
where the other factors include gender, race, family income, and parental education.
(i) Why might CathHS be correlated with ut? (3 marks)
(ii) Evans and Schwab have data on a standardized test score taken when each student was a sophomore. What can be done with these variables to improve the ceteris paribus estimate of attending a Catholic high school? (3 marks)
(iii) Let CathRel be a binary variable equal to one if the student is Catholic. Discuss the two requirements needed for this to be a valid IV for CathHS in the preceding equation. Which of
these can be tested? (3 marks)
(iv) Not surprisingly, being Catholic has a significant effect on attending a Catholic high school.
Do you think CathRel is a convincing instrument for CathHS? (3 marks)
(v) Give an example of two variables that you would include in the variable otherfactors. (3 marks)
(vi) Which test would you implement in Stata to test if these two variables (that you specified in part (v)) affect college ? (3 marks)
2.5 Effect of Prison Population on Violent Crime Rates (20)
This exercise addresses simultaneity between crime rates and prison population in the US. To estimate the effect of prison population increases on crime rates at the state level, [Levitt, 1996] used instances of prison overcrowding litigation as instruments for the growth in the prison population.
Use the data in CRIME.DTA to answer the following questions.
Notes on the data
. Variables beginning with “g” are growth rates from one year to the next, obtained as the changes in the natural log. For example, gcrivit = log (crivit) − log (crivi,t − 1 ).
. Variables beginning with “c” are changes in levels from one year to the next, for example, cunemit = unemit− unemi,t − 1 .
. Full variable labels are provided below.
storage display value
variable name type format label variable label
state |
byte |
%9.0g |
alphabetical; DC = 9 |
year |
byte |
%9.0g |
80 to 93 |
govelec |
byte |
%9.0g |
=1 if gubernatorial election |
black |
float |
%9.0g |
proportion black |
metro |
float |
%9.0g |
proportion in metropolitan areas |
unem |
float |
%9.0g |
proportion unemployed |
criv |
float |
%9.0g |
violent crimes per 100,000 |
crip |
float |
%9.0g |
property crimes per 100,000 |
lcriv |
float |
%9.0g |
log(criv) |
lcrip |
float |
%9.0g |
log(crip) |
gcriv |
float |
%9.0g |
lcriv - lcriv_1 |
gcrip |
float |
%9.0g |
lcrip - lcrip_1 |
y81 |
byte |
%9.0g |
=1 if year == 81 |
y82 |
byte |
%9.0g |
|
y83 |
byte |
%9.0g |
|
y84 |
byte |
%9.0g |
|
y85 |
byte |
%9.0g |
|
y86 |
byte |
%9.0g |
|
y87 |
byte |
%9.0g |
|
y88 |
byte |
%9.0g |
|
y89 |
byte |
%9.0g |
|
y90 |
byte |
%9.0g |
|
y91 |
byte |
%9.0g |
|
y92 |
byte |
%9.0g |
|
y93 |
byte |
%9.0g |
|
ag0_14 |
float |
%9.0g |
proportion of population aged 0 to 14 yrs |
ag15_17 |
float |
%9.0g |
proportion of population aged 15 to 17 yrs |
ag18_24 |
float |
%9.0g |
proportion of population aged 18 to 24 yrs |
ag25_34 |
float |
%9.0g |
proportion of population aged 25 to 34 yrs |
incpc |
float |
%9.0g |
per capita income, nominal |
polpc |
float |
%9.0g |
police per 100,000 residents |
gincpc |
float |
%9.0g |
log(incpc) - log(incpc_1) |
gpolpc |
float |
%9.0g |
2023-09-09