Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ECON60622   Further Econometrics

2021/22

Semester 2 Final Exam

Section A

1.   (a)  (7 points)  Suppose that an individual’s decision about whether or not to participate the labour market is based on a latent variable y*  which is generated as follows:

y*i  = zi γ + vi                                                                           (1)

where zi  is a vector of characteristics relating to the ith individual, γ is a vector of constants and v is an error term with a normal distribution  with variance σ 2 , that is, v ⇠ N(0, σ 2 ).

Suppose that the individual participates in the labour market if y*i  > 0.  Show that this decision making process implies that whether or not an individual participates in the labour force can be captured via a probit model.

(b) A researcher wishes to estimate a wage ofer equation for married women based on the model

yi  = xβ + ui                                                                           (2)

where yi  = ln(wagei ), xi  = (1,educi ,experi ,exper2i ), wagei  is hourly wage rate, educi   is  number of years of education,  experi   is a  measure of work experience, exper2i  = (experi )2 , and ui  is the error term relating to the ith individual.  Since not all  married women are  in the  labour force,  the  researcher estimates  (2) via Heckman’s Two-Step  procedure  in which the woman’s  labour force  participation decision is modeled using the probit model in part (a).

i.  (5 points) The researcher plans to implement  Heckman’s two-step procedure with zi  = xi . Would you recommend this approach? If not, why not? And what would you recommend?

ii.  (7 points)  On the basis of your advice in part (i), the researcher estimates the model (2) via Heckman’s two-step procedure and obtains the output (Figure 1). Using this output, test whether there is sample (self-) selection bias in the wage ofer equation for married women in this sample. Be sure to explain the null and alternative hypotheses and the decision rule.

 

Figure 1:   STATA Output for Question 1. Certain parts have been deliberately omitted.

iii.  (6 points) The variance-covariance matrix of the errors is defined as

VaT "v(u)i(i)  # =    # = ⌃ .

Use the output and your knowledge of the model to provide, , a consistent estimator of ⌃ based on this sample.

Total for Question 1: 25

2.   (a)  Consider the panel data model

yi,t  = β0 + xi,t β 1 + ↵i + ui,t      i = 1 , 2 , . . . ,n,  t = 1 , 2                       (3)

where yi,t   is the dependent variable, xi,t   is the observed explanatory variable, ↵i represents the unobserved heterogeneity, and ui,t the idiosyncratic error. Let βˆ1(FD)  be the first diference estimator of β1  that is, the OLS estimator of β1  from the model

∆yi  = ∆xi β 1 + ∆ui      i = 1 , 2 , . . . ,n,

where ∆yi  = yi,2 − yi,1 , ∆xi  = xi,2 − xi,1  and ∆ui  = ui,2 − ui,1 .

Let βˆ1(W)  be the within (or xed efects) estimator of β1  that is, the OLS estimator of β 1  based on the model

y¨i,t  = i,t β 1 + i,t      i = 1 , 2 , . . . ,n,  t = 1 , 2

where y¨i,t  = yi,t  − y¯i , i,t  = xi,t  − i  and i,t  = ui,t  − i .  Here y¯i  = ⌃t(2)=1yi,t  with similar definitions for i  and i .

i.  (5 points)  Show that βˆ1(W)  = βˆ1(FD) .

ii.  (2 points)  Explain briefly why the fixed efect estimator coincides with the first diference estimator even though the sample size for the rst is 2N, while for the second is N .

iii.  (7 points) Assume that t = 1 , 2 , 3 . Compute variance and pairwise covariances for xed efect-transformed error terms i,1 , i,2 , i,3 .   Compare with variance and pairwise covariances for rst diference-transformed error terms ∆ui,1 , ∆ui,2 . Conclude whether OLS is an eicient way to estimate these models.

(b) A researcher wishes to investigate whether a stronger presence of students afects the rental prices for housing in college towns. To this end, she collects a panel data set on average rental prices and certain other variables for college towns in the US for the years 1980 and 1990. She analyses this question using the following unobserved efects model

log(renti,t ) = β0 +60y90t +β1 log(popi,t )+β2 log(avginci,t )+β3pctstui,t +↵i +ui,t (4)

where renti,t  is the average rent in town i at time t , y90t  is a dummy variable that takes the value 1 if t = 1990 , popi,t  is the population in town i in year t , avginci,t

is the average income in town i  at time t , pctstui,t   is the student population as a  percentage of the town  population during the school year,  ↵i   is  unobservable heterogeneity, and ui,t  – idiosyncratic noise. This is the Stata output from the rst diference estimation of this model.

 

Figure 2: STATA Output for Question 2. Certain parts have been deliberately omitted. dlrent denotes the rst diference of log(renti,t ), dlpop  denotes the rst diference of log(popi,t ), dlavginc denotes the rst diference of log(avginci,t ) and dpctstu denotes the first diference of pctstui,t .

i.  (2 points) What is the interpretation of β2  in this model?

ii.  (3 points)  Explain why the researcher has used the first diference estimator to estimate this model.

iii.  (4 points)  Using a 5% significance level, test whether the results are consistent with the view that rental rates increase with the proportion of the city’s popu- lation that are students.  Be sure to specify the null and alternative hypotheses and the decision rule.

iv.  (2 points) A colleague recommends that the researcher includes in the model a dummy variable that indicates whether the town is located within 50 miles of the ocean shore in the southern half of the US as it will likely increase the rental price. What impact on the estimation results would you expect the inclusion of this additional variable to have?

Total for Question 2: 25

Section B

3.   (a)  (10 points)  Consider the AR(1) model:

yt  = ✓yt−1 + ut                                                                         (5)

where ut  is a white noise process with variance σ 2  and |✓| < 1.

Derive the mean and variance of yt .

(b)  (5 points)  Suppose a researcher is working on a time-series data for variable y, with

a sample size of 1000. She plans to estimate an ARMA(p,q) model for y and so she builds the autocorrelation and partial autocorrelation functions for her time-series. The outputs are given in Figure (3).  Identify plausible values for p and q  using the autocorrelation and partial autocorrelation functions. Be sure to justify your answer clearly.  [max 50 words]

(c)  (5 points) A researcher believes that the time-series she is working on follows an MA(q) model where q could be 2, 3 or 4,  but she is unsure what is the correct choice of q.  Therefore she estimates a model in R for all three choices, model 1 for MA(4), model 2 for MA(3) and model 3 for MA(2).  The outputs from these estimations are given in Figure (4). Use the general to specific strategy to select q. Be sure to justify your answer clearly.  [max 100 words]

(d)  (5 points) The researcher then calculates the Akaike and the Schwarz information criteria for the three estimated models in question (c).  The outputs are given in Figure  (5).   Use  both  information criteria to select  q  and explain the diference between the two. Be sure to justify your answer clearly.  [max 100 words]

Total for Question 3: 25

Figure 5:   Outputs for question (d)

 

4.   (a)  (6 points)  Suppose you are working at the central bank and want to employ a VAR model for the purpose of monetary policy analysis, using quarterly observations on inflation, the output gap and the short term interest rate.  All data are seasonally adjusted and available over a sample period t = 1, ..., T. Explicitly write down the form of the three equations to be estimated if a VAR(1) model is used, carefully defining all the notation you use.  [max 100 words]

(b)  (7 points) The head of the research department asks you to provide an orthogonal-

ized impulse response analysis employing the variable ordering: output gap, inflation, interest rates.  The resulting orthogonalised impulse responses to one standard de- viation shocks in each variable are in Figure (6), where the variables (in Cholesky ordering) are denoted GAP, INFLAT and INTRATE. Using this output, interpret the impulse responses in terms of the efects of the output gap and inflation on both inflation and interest rates. You are not expected to go into macroeconomic details. [max 150 words]

(c)  (6 points) Your colleague wishes to perform the augmented  Dickey-Fuller (ADF) test to assess whether a macroeconomic time series is better modelled by a unit root process or a trend stationary process. Provide an explanation of how to perform the test being sure to explain all relevant details.  [max 100 words]

(d)  (6 points) Your colleague runs the test and obtains a value of -2.149 rounded to the third decimal place (3dp) for the test statistic, and you know that the critical values of the ADF test (to 3dp) are -4.003, -3.432 and -3.139 for respectively 1%, 5% and 10% of significance levels. She also shows a plot of her variable in Figure (7). Based on her plot and the results of the ADF test, explains if she should include a time trend in her model or not.  [max 50 words]

Total for Question 4: 25

Figure 6:   Impulse Response Functions for GDP gap, Inflation, and Interest Rate.  Question

Section C

5. The authors of the paper  Traffic accidents  and  the  London  congestion  charge  wish to investigate whether the introduction of a congestion charge in London, in February 2003, caused a reduction in accidents in the congestion charge zone (CCZ). The relevant tables are located in Appendix (A)

(a)  (5 points)  Briefly argue why you would (or would not) expect the introduction of

a congestion charge to reduce the number of accidents inside the congestion zone. [max 100 words]

(b)  (5 points)  Can the dataset used in the paper be defined as cross-sectional data,

balanced panel data, unbalanced panel data or repeated (or pooled) cross-sectional data? Explain your answer.  [max 50 words]

Accit  = φ + 6CCZi + ↵Policyt + β(CCZi * Policyt )+ γXit + TTt + ✏it       (23)

Explain how the variables Accit, CCZi  and Policyt  are defined.

(d)  Referring to the results in Table 1, Appendix (A).

i.  (3 points)  Interpret the estimated efect of the policy

ii.  (3 points)  Calculate a 95% confidence interval for the policy coeicient in the base model.

iii.  (4 points)  Comment on the statistical and economic importance of the esti- mated efect.

(e)  (20 points)  Using what you learned from reading the Green et al. paper, how would

you design a diference-in-diference analysis to investigate whether the introduction of a low-emission zone in London (From February 2008 only cars with low emissions are allowed to enter Greater London, see https://tfl .gov .uk/modes/driving/ low-emission-zone for some details) has achieved its aim of reducing air pollution? Your answer should, at a minimum, address the following issues:  [max 400 words]

What would be a sensible outcome variable? And how would this be measured (no need to check whether these data are actually available, just say what you would ideally have)?

Define the treatment and control group.

Specify your basic regression model that you would use to estimate the efect of the low emission zone. Ensure to define all terms.

  How would you go about establishing whether the parallel trends assumption can be defended?

Total for Question 5: 50