ECMT: Econometric Applications Problem Set 4 Solutions
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
ECMT: Econometric Applications Problem
Set 4 Solutions
Semester 2 2022
Question 1. Computer Exercise: Student Literacy and School Resources
(i) Download the dataset student_literacy.dta from the Course Canvas site. Generate the vari- ables lnSchoolExpendPP and lnEnrol and report the sample mean, minimum and maximum values for the variables: Read5YR, lnSchoolExpendPP, lnEnroll, and Poverty.
Version 1: Sample size n is 874 observations.
Table 1: Summary Statistics
mean min max
Read5YR 71.80435 0 100
lnSchoolExpendPP 8.529159 7.345164 9.35424
lnEnroll 5.927453 4.127134 7.183112
Poverty 39.25483 0 100
Version 2: Sample size n is 871 observations.
Table 2: Summary Statistics
mean |
min |
max |
|
Read5YR |
71.99828 |
0 |
100 |
lnSchoolExpendPP |
8.527641 |
7.345164 |
9.35424 |
lnEnroll |
5.936884 |
4.127134 |
7.183112 |
Poverty |
39.15592 |
0 |
100 |
(ii) Estimate the equation and report the results in the standard way.
Version 1:
Re—ad5YR = 148.4024 − 5.476753 lnSchoolExpendPP − 5.04196 lnEnrol
(32.2423) (3 .260626) (1.717343)
n = 874, R2 = 0.0107, 2 = 0.0084
Version 2:
Re—ad5YR = 153.6222 − 5.789918 lnSchoolExpendPP − 5.432064 lnEnrol
(32.56946) (3 .289027) (1.735956)
n = 871, R2 = 0.0121, 2 = 0.0098
(iii) What is the interpretation of β1 ? What is the expected sign of β1 ? Explain.
Both versions:
The coefficient β1 measured the change in the expected value of Read5YR (per cent of stu- dents at a school who pass a common reading test) due to an increase in the log-average school spending per student by 1 unit, holding log-total school enrolment constant. We expect the sign of β1 to be positive; all else equal, extra school resources and spending may be expected to improve school outcomes, such as the literacy rate and student’s performance on exams.
(iv) Test 。0 : β 1 = 0 against 。1 : β 1 0 using a 10% significance level. What do you conclude?
Does your conclusion change if you use a 1% significance level?
Version 1:
Hypothesis Test:
。0 : β 1 = 0
。1 : β 1 0
Test Statistics:
' βˆ1 '
' se (βˆ1 ) '
= ' '
= − 1.68
Rejection Rule: Reject 。0 in favour of 。1 if |t| > c, where t is the t-statistic and c is the
critical value for the t-distribution with df = 874 − 2 − 1 = 871 and a 10% significance level
Now |t| = 1.68 and c = 1.645.
Decision: Since |t| > c we reject the null at the 10% significance level (but only just).
Conclusion: The logarithm of school spending per student has a statistically significant ef- fect on the fraction of students at the school who pass the reading test, holding enrollment constant. (Although the effect is in the opposition direction to that expected!)
Note: In this case we might say that our result is ‘borderline’ given the test statistic is so close to the critical value.
At the a 1% significance level there is insufficient evidence to reject the null hypothesis. We can see this by examining the p-value reported by STATA which is 0.093.
Version 2:
Hypothesis Test:
H0 : β 1 = 0
H1 : β 1 0
Test Statistics:
' βˆ1 '
' se (βˆ1 ) '
' -5.789918 '
= ' 3.289027 '
= -1.76
Rejection Rule: Reject H0 in favour of H1 if |t| > c, where t is the t-statistic and c is the critical value for the t-distribution with df = 871 - 2 - 1 = 868 and a 10% significance level
for a two-sided alternative.
Now |t| = 1.76 and c = 1.645.
Decision: Since |t| > c we reject the null at the 10% significance level.
Conclusion: The logarithm of school spending per student has a statistically significant ef- fect on the fraction of students at the school who pass the reading test, holding enrollment constant. (Although the effect is in the opposition direction to that expected!)
At the a 1% significance level there is insufficient evidence to reject the null hypothesis. We can see this by examining the p-value reported by STATA which is 0.079.
(v) Estimate the equation and report the results in the standard way.
Re—ad5YR = 42.69087 + 8.413534 lnSchoolExpendPP - 4.153264 lnEnrol - 0.4592712 Poverty
(26.47635) (2.708771) (1.386574) (0.0212458)
n = 874, R2 = 0.3564, 2 = 0.3542
Version 2:
Re—ad5YR = 45.39149 + 8.361389 lnSchoolExpendPP - 4.462578 lnEnrol - 0.4648677 Poverty
(26.5334) (2.710253) (1.390563) (0.0210631)
n = 871, R2 = 0.3674, 2 = 0.3652
(vi) What happens to the coefficient on lnSchoolExpendPP when the additional explanatory variable Poverty is added to the model? Explain the reason for the difference in estimates for β 1 in models (1) and (2).
Both versions:
Including Poverty as an explanatory variable in the model explaining Read5YR leads to the sign of the estimate of β1 to change from negative to positive. This reflects the fact that lnSchoolExpendPP and Poverty are correlated. In fact, these variables have a strong pos- itive correlation (δv1 = 0.23885, δv2 = 0.2377), and Poverty has a strong negative effect on Re—ad5YR. As a result, the estimated coefficient for β 1 in model (1) is subject to omitted variable bias. Given the positive correlation between lnSchoolExpendPP and Poverty, and the negative partial effect of Poverty on Read5YR, in model (1) βˆ1 is downward biased (i.e. E (βˆ1 ) < β 1).
(vii) From model (2), obtain the predicted Read5 when lnSchoolExpendPP = 8.5, lnEnroll = 5.9 and Poverty = 39. Estimate a regression which allows you to put a 95% confidence inter- val around the predicted value (this is a ‘conditional’ or ‘within sample’ prediction). Report the confidence interval.
Version 1:
E [Read5YR | lnSchoolExpendPP = 8.5, lnEnroll = 5.9, Poverty = 39] =
42.69087 + 8.413534 × 8.5 − 4.153264 × 5.9 − 0.4592712 × 39 = 71.790075
Version 2:
E [Read5YR | lnSchoolExpendPP = 8.5, lnEnroll = 5.9, Poverty = 39] =
45.39149 + 8.361389 × 8.5 − 4.462578 × 5.9 − 0.4648677 × 39 = 72.004246
To construct a confidence interval we need to estimate the standard error of the prediction. We can obtain this from estimating a transformed model:
θ = β0 + β1 × 8.5 + β2 × 5.9 + β3 × 39
= β0 + 8.5β1 + 5.9β2 + 39β3
β0 = θ − 8.5β1 − 5.9β2 − 39β3
Substitute into the population model gives:
Read5YR = θ − 8.5β1 − 5.9β2 − 39β3 + β1 lnSchoolExpendPP + β2lnEnroll + β3 Poverty = θ + β1 (lnSchooleExpendPP − 8.5) + β2 (lnEnroll − 5.9) + β3 (Poverty − 39)
Estimate this model; the intercept corresponds to the prediction and it’s standard error is the extra piece of information we need to compute the confidence interval.
The 95% Confidence Interval for the conditional prediction is given by:
θˆ 干 c × se ( )θˆ
Version 1:
95% C.I. = 71.79008 ± 1.96 × 0.5546288
= 71.79008 ± 1.087072
= [70.703008, 72.877152]
Version 2:
95% C.I. = 72.004246 ± 1.96 × 0.5560584
= 72.004246 ± 1.089874
= [70.914372, 73.094120]
(viii) Now consider a prediction for an individual. Construct the 95% confidence interval for the predicted Read5YR for an school where lnSchoolExpendPP = 8.5, lnEnroll = 5.9 and Poverty = 39 (this is an ‘unconditional’ prediction). Comment on the width of this confi-
dence interval compared to that in (vii).
For the unconditional prediction:
1
se (θˆu ) = ( [se ( )]θˆ 2 + 2 )2
where se ( )θˆ is the standard error of the conditional prediction which we found in (vii) pre- viously.
Version 1:
se (θˆu ) = (0 .55462882 + 2 )
= ^0.55462882 + 260.799468
= ^261.107081
= 16.158808
Version 2:
se (θˆu ) = (0 .55605842 + 2 )
= ^0.55605842 + 260.303599
= ^260.612800
= 16.143506
Therefore, the 90% Confidence Interval for the unconditional prediction is given by: θˆ ± c × se (θˆu )
Version 1:
95% C.I. = 71.79008 ± 1.96 × 16.158808
= 71.79008 ± 31.671264
= [40.118816, 103.461344]
Version 2:
95% C.I. = 72.004246 ± 1.96 × 16.143506
= 72.004246 ± 31.641272
= [40.362974, 103.645518]
The width of the 95% confidence interval for the unconditional prediction is much larger than that for the conditional prediction in (vii) due to the additional uncertainty arising from the
variance of the error term associated with the unconditional prediction.
Note: 2 = from the STATA output.
(ix) Generate a new variable which is equal to lnEnroll2, and add it to the model in (v). Estimate the model and compute the turning point in the quadratic.
Version 1:
The estimated sample regression function is:
Re—ad5YR = − 116.2954 + 8.343561 lnSchoolExpendPP
+ 50.33022 lnEnroll − 4.636574 lnEnroll2 − 0.4510175 Poverty The maximum of Re—ad5YR occurs at the value lnEnroll ∗ where:
−βˆlnEnroll
2 × βˆln(2)Enroll2
−50.33022
2 × −4.636574
= 5.427523
Version 2:
The estimated sample regression function is:
Re—ad5YR = − 126.8176 + 8.221992 lnSchoolExpendPP
+ 54.6259 lnEnroll − 5.019177 lnEnroll2 − 0.455856 Poverty
The maximum of Re—ad5YR occurs at the value lnEnroll ∗ where:
−βˆlnEnroll
2 × βˆln(2)Enroll2
−54.6259
2 × −5.019177
= 5.441719
(x) In your assessment, does the model in (v) measure the causal effects of lnSchoolExpendPP on Read5YR? Explain your reasoning.
Both versions:
The most important part of the answer is providing a reasoned discussion to support your assessment of the model in (v). For full marks it is essential that the Zero Conditional Mean assumption be discussed (and assessed as whether it is likely to be met here). The value of the R2 statistic is not relevant to this discussion.
2022-09-08