ECON3018J – Advanced Econometrics – final assignment


PART A  Long answers (30 points)

Choose two out of the three topics below

Each topic is worth 15 points

Length restriction: maximum 0.5 page per topic


Topic 1: You are interested in estimating the causal effect of education on income. One of your colleagues suggest that you should use “parent’s education” as instrumental variable (IV) for education. Explain why you think this is a good or a bad IV for  estimating the causal effect you are interested in. Suggest another IV that you       believe is better suited for that purpose and explain the reasons for your choice.

Topic 2: Why are randomized experiments relatively rare in social sciences, and why do      economists usually talk about quasi-experiments in their quest to estimating causal effects?

Pick one of the several causal methods we have seen in class, briefly describe it and explain what is/are the main crucial underlying assumption(s) that allow(s) to          estimate causal effects.

Topic 3: We have talked about several threats to causal identification in the course. Pick one of them, describe it, and explain what are the conditions under which it would result in a bias.

PART B  Ordered response model (30 points)

The ministry of education asks you to study the effects of education on income of workers.         However, the data only contain information on whether a worker belongs to one of three income categories corresponding to low income”, “middle income” or “high income”, respectively, and information on education.

1)  (10 points) Write down an ordered probit model for the statistical relationship between   income category on the one hand and education on the other. Give an expression for the corresponding probabilities for a person falling into each of the three income categories. Also, make sure that your model is identified, i.e., that you are clear about what are the   unknown parameters of the model that you would like to estimate, and which parameters are fixed. What are the distributional assumptions about the error term?

Length restriction: maximum 0.5 page

2)  (10 points) After your first modelling attempts, you obtain some additional information   from the ministry, namely that “low income” corresponds to the income range “less than $40,000”, that “middle income” corresponds to the income range “$40,000‐$80,000” and that high income” corresponds to the range more than $80,000” . How can you             incorporate this additional information in your econometric model for income? Write      down the corresponding ordered probit model with the new information at hand. Which  parameters can be identified with this additional information?

Comment briefly (2-3 sentences) on the main advantages of having this additional information.

Length restriction: maximum 0.5 page

3)  (8 points) Write down the maximum likelihood function of the model you defined in 2).

Length restriction: maximum 0.5 page

4)  (2 points) Explain briefly why we might consider transforming the maximum likelihood function above by taking the log of it?

Length restriction: maximum 3 sentences

PART C  Censored regressions (20 points)

The ministry of health asks you to explore the association between age and general cognitive       functioning  for persons aged 50 and older. In order to complete this task, the ministry             supplies you with data from a representative sample of persons aged 50 and older that contain     each person’s age as well as each person i’s score  on a clinical cognitive performance test. The cognitive performance test measures cognitive function on a continuous scale and ranges from 0 to 100 with higher values indicating higher levels of cognitive functioning. However, one issue   with the clinical cognitive performance measure is that it is specifically designed to detect poor   cognitive functioning and cognitive impairment, but does not provide any discrimination among people with comparatively high levels of cognition. As a result, about 40% of the people              achieve the maximum attainable score  ,  = 100 on the clinical cognitive performance

test. In addition, even persons with severe cognitive disability cannot score less than the              minimum score on the performance test, i.e.,  ,  = 0, which occurs for roughly 1% of the       sample. Statistically speaking, the outcome data is both “left‐censored” and “right-censored” and can be seen as some kind of corner solution outcomes.

1)  (10 points) Write down a statistical model for cognitive functioning among persons aged

50 and older that takes into account the specific nature of the data described above         (left‐ and right‐censoring) and in which cognitive functioning depends only on age.        HINT: Use the idea of a latent (unobserved) variable measuring general cognitive      functioning and specify how this variable maps into the observed clinical performance   score  . Also, be specific about the distributional assumptions of the error term that you make in your model.

Length restriction: maximum 0.5 page

2)  (10 points) Briefly describe the steps that you would put in place to estimate this model and the estimation technique that you would use. You do not need to provide any          formulas.

Length restriction: maximum 0.5 page

PART D  Binary response model (20 points)

A marketing company asks you to estimate the effects of education on internet access. Using a    nationally representative random sample of adults, you model the binary outcome variable             , which takes the value 1 if person  has access to the internet and 0 otherwise, as a       function of person i’s  (in years). Specifically, you estimate a linear regression model (OLS) and a probit model with  as the dependent variable and   as the          explanatory variable.

1)  (5 points) What is the main conceptual disadvantage of the linear regression model (OLS) relative to the probit model in this setup?

Length restriction: maximum 0.5 page

2)  (5 points) Are the (“slope”) coefficients  that you derive from the linear regression model and the probit model directly comparable? Why or why not?  What are the    marginal effects in these two model specifications?

Length restriction: maximum 0.5 page

3)   (6 points) Why is it generally more challenging to report marginal effects for the probit model than for the linear regression model?

Length restriction: maximum 0.5 page

4)   (2 points) After having run the OLS and the probit regressions, you discover that there is a positive effect of education on the probability of having Internet access. Using a graph similar to the one below, with years of education on the x-axis and Internet access on the y-axis, sketch how the predicted values of Internet access would look like when 1) using a linear regression model and when 2) using a probit model specification.

5)   (2 points) A colleague of yours suggests that you could perhaps use a logit specification instead of a probit specification. What is the main difference between the two?

Length restriction: maximum 3 sentences