关键词 > ECOM30002/90002
ECOM30002/90002 ECONOMETRICS 2 SEMESTER 2, 2022 ASSIGNMENT 4
发布时间:2022-10-18
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
ECOM30002/90002 ECONOMETRICS 2
SEMESTER 2, 2022
ASSIGNMENT 4
Section 1
In Australia, the government uses a number of financial incentives to bolster the uptake of private health insurance (PHI). Rich individuals face a negative incentive payment (a tax penalty) called the Medicare Levy Surcharge (MLS) if they do not purchase PHI. Eligibility for the MLS is a function of the individual’s taxable income, with the threshold having changed a number of times in the past. Consider the following panel data model for the demand for PHI:
PHIit = ρPHIit-1 + βMLSMLSit + X tβ + αi + γt + Vit , (1)
where PHIit is a dummy indicating that individual i in year t had purchased PHI, and MLSit is a dummy indicating that i in t was eligible for the MLS. Xit is a vector containing further covariates. (For background and reference, see TC Buchmueller, TC Cheng, NTA Pham and KE Staub (2021), “The effect of income-based mandates on the demand for private hospital insurance and its dynamics,” Journal of Health Economics, Vol. 75, 1023406.)
1. Explain which parts of Equation (1) can reflect persistence in the demand for PHI beyond that explained by Xit and how the persistence induced by these parts differs from each other.
2. A group of individuals becomes temporarily eligible for MLS in year t (they are ineligible in all other periods). What is the effect of this temporary eligibility on the contemporaneous demand for PHI in years t, t + 1 and t + 2 if (i) ρ = 0? (ii) ρ 0?
The data in the file phi .csv contains an unbalanced panel of 2,531 individuals observed up to ten years, for a total of 17,046 observations. The variables in the data are
pid |
individual identifier i |
yr |
survey year t |
phi |
=1 if i was covered by PHI in year t |
mls |
=1 if i was eligible for MLS in year t |
agesq |
squared age: ageit(2) |
hhinc |
total household income |
hcond |
=1 if individual suffers from a long-term health condition |
3. Use the data in the file phi .csv to estimate four models related to equation (1). For all models, include the following variables in Xit: agesq/10000, hhinc, hhinc2 , hcond. The four models are:
(i) pooled OLS of PHIit on MLSit and Xit;
(ii) fixed effects regression of PHIit on MLSit and Xit with individual-specific effects;
(iii) two-way fixed effects regression of PHIit on MLSit and Xit (individual and year fixed effects);
(iv) Anderson-Hsiao instrumental variable regression of PHIit on PHIit-1 , MLSit and Xit (with individual and year fixed effects) using PHIit-2 as an instrument.
Note: For the dynamic model, you can model your R code after the code in Lecture 19. The basic specification for plm() should be
∆PHIit ~ ∆PHIit-1 + ∆MLSit + ∆Xit + ∆Dt │ PHIit-2 + ∆MLSit + ∆Xit + ∆Dt
where Dit is a set of year dummy variables. Because of multicollinearity, three year dummies need to be excluded (but if you don’t do this, R will do it for you).
Report your results in a table showing estimated coefficients, standard errors, and the num- ber of observations in each regression. Omit estimated coefficients of constants and year dummies, if any. Report the coefficients of the differenced variables from (iv) in the same lines as the corresponding undifferenced variables of (i)-(iii). For non-integer values, round to three decimal places.
4. Consider only the results for models (i) and (ii). What does the difference between the Pooled OLS and FE estimates of the coefficient on MLS reveal about the effect of omitted individual-specific variables on the demand for PHI and their relation to MLS?
5. Now consider only the results for models (ii) and (iii). Is further controlling for omitted year-specific unobservables important for the estimate of the effect of MLS on the demand for PHI? Interpret the estimated coefficient on MLS.
6. Why might the specification of Xit in this section omit the variable ageit and only include ageit(2)? Respond by referencing the models with two-way fixed effects (iii and iv). Further, what might the reason be for including ageit(2)/104 instead of just ageit(2)?
7. Are the FE estimators (ii) and (iii) consistent if ρ 0? Explain and discuss which of the four estimators in the table is your preferred estimator based on the reported results and why.
8. The estimated coefficient for model (iv) is smaller than the estimated coefficient for model (iii). Does this mean that model (iv) predicts that MLS has smaller effects on PHI than model (iii)? Discuss.
Section 2
There are two important meanings of “spurious regression” in econometrics. First, a large/significant regression coefficient may not measure a causal effect. We have dealt with this topic extensively in this subject.
The second meaning of “spurious” often arises in the context of time series regression: A large/significant regression coefficient may be a type I error; i.e., there is no correlation in the population, but the sample statistic indicates there is. In this section, you will use simulations to explore this type of spurious regression.
The section is designed so that the information in the questions and your progressive answers are sufficient to learn and understand the concept, causes, and some solutions for spurious re- gression in time series. However, if you want additional information, you can additionally use the following sections from the textbooks as background readings: Wooldridge 18.3 (“Spurious regression”), Stock & Watson 15.7 (“Nonstationarity I: Trends”; in particular, subsection “Spurious regression”).
Consider a simple regression of the time series Yt on the time series Xt with 400 periods (T = 400):
Yt = β0 + β1Xt + Ut, t = 1, . . . , 400.
Throughout the section, Yt and Xt are not related to each other, so that β 1 = 0. The interest lies in the estimated β 1 and the result of testing H0 : β 1 = 0.
9. To begin, consider a baseline scenario where there is no spurious regression problem. Let Yt ~ N (0, 1) and Xt ~ N (0, 1). Simulate 5,000 repeated samples or replications from this DGP. In each replication, regress Yt on Xt, and save βˆ1 and the p-value of the significance test of βˆ1 . You can use the following R code:
rm(list = ls())
set .seed(2022)
n <- 400
reps <- 5000
beta1 .hat <- matrix(nrow=reps, ncol=1)
pval .beta1 <- matrix(nrow=reps, ncol=1)
for (j in 1:reps){
y <- rnorm(n)
x <- rnorm(n)
eq <- summary(lm(y~x))
beta1 .hat[j] <- eq$coefficients[2,1]
pval .beta1[j] <- eq$coefficients[2,4]
}
– Show a figure of the histogram for βˆ1 .
– Report the estimated rejection frequency for the test (to four decimal places).
– Show a time series plot with both Yt and Xt for the last simulated sample/replication. Do the histogram and rejection frequencies conform to your expectations? Explain.
10. Now modify the DGP so that the two time series variables have linear deterministic trends: Yt ~ N (0.01t, 1) and Xt ~ N (0.015t, 1). Run a new simulation of 5,000 replications. Again,
– show a figure of the histogram of βˆ1 ;
– report the estimated rejection frequency for the test (to four decimal places); and
– show a time series plot of Yt against Xt for the last simulated sample/replication.
11. Using the same DGP as in part 10, modify the regression: Instead of just regressing Yt on Xt, regress Yt on Xt and t (that is, add a deterministic linear trend to the regression). Run a new simulation of 5,000 replications.
– Show a figure of the histogram of βˆ1 and
– report the estimated rejection frequency for the test (to four decimal places).
12. What do you conclude from the results you produced in parts 10 and 11 for the DGP with deterministic trends?
13. Now consider a modification to the DGP where the two time series variables have so-called stochastic trends. Specifically, let both time series be AR(1) with ρ = 1 (these are called a unit root processes):
Yt = Yt-1 + εt, Xt = Xt-1 + υt ,
where εt and υt are IID standard normal and independent of each other. We say Yt has a stochastic trend because the difference in Yt (∆Yt) is equal to a random variable (ε). One can show that unit roots can be represented as the sum of all past errors (formally, Yt = Y1 + j(t)=2 εj). This can be implemented in R by drawing Yt and Xt as
y <- cumsum(rnorm(n))
x <- cumsum(rnorm(n))
Run a new simulation of 5,000 replications and, once more,
– show a figure of the histogram of βˆ1 ;
– report the estimated rejection frequency for the test; and
– show a time series plot of Yt against Xt for the last simulated sample/replication.
14. Using the same DGP as in part 14, modify the regression: Instead of regressing Yt on Xt , regress ∆Yt on ∆Xt . Then,
– show a figure of the histogram of βˆ1 ; and
– report the estimated rejection frequency for the test (to four decimal places).
15. What do you conclude from the results you produced in parts 13 and 14 for the DGP with stochastic trends?