Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Examination 2022/23

5SSMN932: Introduction to Econometrics

Exam Period 1, January 2023

SECTION A

Answer ANY FOUR questions from this section. Section A carries 20 marks

1. (5 marks) To understand what affects the UK rental rates on AirBnB (an online platform for short-term apartments lets), a KCL student estimates the following regression:

log(AirBnB) = β0 + β1 log(size) + β2 log(avginc) + β3tourist + u,

where AirBnB  is the average monthly rent paid on an AirBnB rental units in a UK city, size denotes the room size in m2 , avginc the average city income in thousands of pounds (GBP), and tourist the percentage of tourists to the city’s total population.

Using data from 124 random cities in the UK, she gets the following results, with standard errors reported in parentheses:

log(AirBnB)

=   .013 + .011 log(size) + .507 log(avginc) + .0016tourist

(.234)       (.009)                             (.081)                                    (.0007)

n = 124,        R2 = .658.

According to these results, what is the effect of a 10% increase in average income in a city on the rental price of an AirBnB unit in that city?

2. (5 marks) Based on the same regression from question (1.), a policymaker wants to know if avginc and tourist have a jointly significant effect on the average monthly rentals. Explain in detail the appropriate test, and the steps needed to answer the policy maker’s question. Explain if you require any additional information.

3. (5 marks) Explain the concept of heteroskedasticity. Distinguish this concept from the zero conditional mean assumption. What are the implications of heteroskedas- ticity for an OLS regression?

4. (5 marks) Explain the concept of selection bias.  Explain the concept of omitted variable bias.  Why do we tend to talk about selection bias and omitted variable bias interchangeably?

5. (5 marks) Explain the concept of perfect multicollinearity. Give an example. What are the implications of multicollinearity for an OLS regression? Explain.

6. (5 marks) Explain the intuition behind an Instrumental Variable (IV) approach. Dis- cuss the two conditions for a valid IV.

SECTION B

Answer Any TWO questions from this section. Section B carries 60 marks.

1. A researcher investigates the various determinants of the growth rate of house prices in 123 randomly selected towns in England.  She uses a panel data with annual information from 1992 to 2022. The following regression is estimated:

HousePriceGrowthit  = β0 +β1 log(townGDPit )+β2townEmploymentit +ϕt +λi +εit , (B.1)

where HousePriceGrowthit  is the annual growth rate of the average house price in town i and year t measured in percentages, townGDP and townEmployment, respectively, are the GDP of the town at year t measured in thousands of pounds, and the percentage of residents who are in full-time employment in town t.  She includes the logarithm of the town’s GDP in the regression. ϕt  and λi  are the time (year) and town fixed effects.

(a) (10 marks) Explain why it is necessary to include the town and time-fixed effects.  In your answer, clearly give examples of factors that the time and town fixed effects are capturing (two examples each).

(b) (10 marks) Clearly interpret the coefficient β1 . Can we interpret an estimate of β 1  as the causal effect of a town’s GDP on its house price growth rate? Explain. In your answer, clearly define what unbiasedness is.

(c) (5 marks) House prices can be affected by the mortgage rate, which depends on the interest rate set by the Bank of England. Another researcher suggests including the current interest rate in Equation (B.1).  Explain whether or not this idea is sound and how it would affect regression (B.1)

(d) (5 marks) What does autocorrelation mean?  Explain why εit  is likely to be autocorrelated.

2.  Let us consider a model where we want to explain whether a married woman works for wages or not. We may define the dependent variable paid to equal 1 when the woman works for wages, and zero otherwise, and consider the following estimation of the model:

Pr(paid=1|x) = G(−0.091 − 0.143nwifeinc + 0.562educ + 1.01exper) +  ,    (B.2)

where G is the logistic cumulative density function where G(z)  =  exp(z)/(1 + exp(z)). The explanatory variables are nwifeinc denoting other sources of income in the household (in $000), educ is years of education, and exper  is the years of past labour experience.  is the regression residual.

(a) (15 marks) Discuss the advantages and drawbacks of using the Logit model

instead of a Linear Probability Model (LPM) model to explain whether a mar- ried woman works for wages or not.   In your answer you are expected to explain what an LPM model is and write down the regression equation.

(b) (15 marks) Briefly explain how the coefficients in (B.2) can be estimated.

What is the probability that a married woman will obtain a paid job if she has 10 years of education, 2 years of past labour market experience and is without any other sources of income?

3. A team examines the determinants of movie ratings on IMDb, the world’s most popular and authoritative source for movie, TV, and celebrity content. They have the data on all 455 movies released in 2021, with information on the number of positive ratings in the first week after release Ratings, the total cost of production cost in millions of dollars, the number of super-stars in the cast, SuperStars, and whether the director has ever won an Oscars, regarded by many as the most pres-

tigious and significant award in the entertainment industry worldwide Oscars.

They obtain the following estimation, using the logarithms of Ratings and cost:

log(—Ratings)   =    .123 + 1.51 log(cost) + .507SuperStars + .501Oscars

(0.114)       (.231)                               (.156)                                       (.123)

n = 455,        R2  = .458.

(a) (10 marks) Clearly interpret the coefficient estimate on log(cost).  Can we

interpret this estimate as a causal effect on the movie rating? If yes, explain. If no, clearly discuss the direction of a potential bias.  Explain the necessary assumptions required for causality.

(b) (10 marks) Explain how you can test whether the effect of one additional super-star in the casting on the rating is similar to the effect of having an Oscars-winning director.   Clearly state any additional information you may require.

(c) (10 marks) A colleague suggests that the effect of one additional super-star is more pronounced on the rating for those movies with an Oscars-winning director.  Discuss how you can test this claim.  Clearly provide any extra re- gression estimation or extra information needed.

SECTION C

Answer ALL TWO questions from this section. Section C carries 20 marks.

1. Consider the following OLS estimation of a model to explain the labour demand of

500 large UK companies after the Covid pandemic using financial data in 2022, all variables are calculated at the end of the year:

hirest

=   0.96 + 0.74 log (profitt ) − 0.79 log (researcht ) + 0.34newst ,

n = 500, R2  = 0.39.

where hires is the number of newly recruited employees in thousands (treated as a continuous variable), profitt  is the profit before tax in millions of GBP, researcht is the expenditure on research and development in millions of GBP, and newst  is the number of newsletters issued by the company. The robust standard errors are reported in parentheses.

(a) (5 marks) What is the interpretation of the coefficient on log(profitt )? Is the sign of the coefficient similar to what you would expect? Explain your answer.

(b) (5 marks)  Suppose the profitt  is now measured in thousands of GBP, ex- plain briefly how this change in units affects the estimates and your interpre- tation of the coefficient in (a). Hint: a million is thousand*thousand.

2. A researcher estimated the following regression of how parental income affects their child’s hourly wage:

hourlywagei

=   0.29 + 1.14 log (parentalincomei ) + 1.79 log (experiencei )

n = 1500, R2  = 0.49.

where hourlywage is the hourly wage per hour, measured in GDP per hour,            log(parentalincome) is the logarithm of the worker’s parental income, log(experience) is the logarithm of the number of years of experience. The robust standard errors  are reported in parentheses.  The regression uses a representative sample from

the UK 2000 census.

(a) (5 marks) Can we interpret the coefficient on log(parentalincome) as the

causal effect of parental income on hourly wage?  Explain why clearly indi- cating the necessary condition for a such causal interpretation.

(b) (5 marks) Suppose you obtain the residuals from the above regression, residuals, and then run the following regression:

residualsi  = γ0 + γ1 log(parentalincome) + γ2 log(experience) + ui

What do you expect will be the R-squared of this second regression? Explain if you need any further information.