Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

School of Economics

Applied Econometric Methods

ECON3208

Tutorial Program

Term 1, 2022

Week 2 Tutorial Exercises (for the content of week 1 lecture)

Readings

.    Review your ECON2206 (Introductory Econometrics).

.    Make sure that you know the meanings of the econometrics terms mentioned in ECON2206.

.    Read Chapter 15.1-15.2 thoroughly.

.    Make sure that you know the meanings of the Key Terms at the chapter end.

.    Answer the summary questions in the lecture slides.

Question Set (these will be discussed in tutorial classes)

Q1. [smoke.dta, smoke-w2.do] To better understand the determinants of tobacco demand,

consider the following specification of a demand function for daily cigarette consumption:

cigs = β0  + β1lincome + β2lcigpTic + β3educ + β4age + β5agesq + u.

This model is estimated using a sample of 807 individuals. Variable definitions and

associated summary statistics, together with the OLS estimation results, are given in the tables below. Please answer the following questions using the information in the tables.

(a)  Carefully interpret all of the parameters in the regression model including their magnitudes and expected and actual signs.

(b) Comment on the statistical significance of each of the estimated coefficients.

(c)  On the basis of the estimation results, comment on the role of income and price as determinants of cigarette demand.

(d) Suppose the regression model was re-estimated under the hypothesis that β1  = β2  = 0  to yield a residual sums of squares of 145, 139. If the original model had a residual sums of squares of 144,910, test the null hypothesis that  β1  = β2  = 0. Does the result of this   test affect your conclusion in (c)?

(e)  On the basis of the estimation results, comment on the role of age as a determinant of cigarette demand.

(f)  Notice that there are no variables relating to government policy interventions such as prohibition of tobacco advertising, or inclusion of health warnings on cigarette packets. Provide one justification why this omission could have been a reasonable assumption for these data.

 

[Run smoke-w2.do in STATA. Try to understand the commands in smoke-w2.do.]

Q2. [Continue with Q1] Suppose the following diagnostics were associated with this model

R2 = 0.0451,   RESET = 2.03 (p-value = 0.132),   BP = 25.81 (p-value = 0.0001)

where the RESET test uses both squared and cubed predictions as additional variables, and   BP is the LM version of Breusch-Pagan test (see Ch8.3) that specifies that heteroskedasticity is a function of lincome, lcigpr, educ, age, and agesq.

(a) What is the null hypothesis RESET is testing? BP? Interpret each of the diagnostics. (b) Do the values of the diagnostics lead you to modify any of your answers in Q1?

(c)  Comment on the overall adequacy of the model in terms of the reported diagnostics and the results discussed in Q1.

Q3. Wooldridge 3.10. (omitted variable bias)

Suppose that you are interested in estimating the ceteris paribus relationship between y

and x1. For this purpose, you can collect data on two control variables, x2  and x3. (For

concreteness, you might think of y as final examscore, x1  as class attendance, x2  as GPA up   through the previous semester, and x3 

estimate from y on x1  and let β1  be the multiple regression estimate from yon x1, x2, x3 .    (i)  If x1 is highly correlated with x2  and x3  in the sample, and x2  and x3  have large partial

effectsony, would you expect β1  and β1 to be similar or very different? Explain.

(ii) If x1 is almost uncorrelated with x2  and x3, but x2  and x3  are highly correlated, will β1

and β1 tend to be similar or very different? Explain.

(iii)If x1 is highly correlated with x2  and x3, and x2  and x3  have small partial effectsony, 

(iv) If x1 is almost uncorrelated with x2  and x3,  and x2  and x3  have large partial effectsony,

and x2 and x3 are highly correlated, would you expectse(β1) or se(β1) to be smaller? Explain.

Q4. Wooldridge C3.9. (charity.dta, charity-w2.do)

Use the data in CHARITY to answer the following questions:

(i)  Estimate the equation

gift = β0  + β1 mailsyear + β2giftlast + β3propresp u

by OLS and report the results in the usual way, including the sample size and R-squared. How doesthe R-squared compare with that from the simple regression that omits giftlast and propresp?

(ii) Interpret the coefficient on mailsyear. Is it bigger or smaller than the corresponding simple regression coefficient?

(iii)Interpret the coefficient on propresp. Be careful to notice the units of measurement of propresp.

(iv) Now add the variable avggift to the equation. What happens to the estimated effect of mailsyear?

(v) In the equation from part (iv), what has happened to the coefficient on giftlast? What do you think is happening?

Q5. Wooldridge 2.8. (OLS algebra)

Consider the standard simple regression model y  = β0  + β1x + u under the Gauss-Markov

Assumptions SLR.1, SLR.2, SLR.3, SLR.4 and SLR.5. The usual OLS estimators β0  and β1  are

unbiased for theirrespective population parameters. Let β1  be the estimator of β1  obtained by assuming the intercept is zero (see Section 2-6).

(i)   population intercept  β0  1 is unbiased?

(ii) Find the variance of β1. (Hint: The variance does not depend on β0.)

(iii)Show that Var1 )  Var(1). [Hint: For any sample of data, 1 xi(2)  1 (xi  )2,

with strict inequality unless  = 0.]

(iv) Comment on the tradeoff between bias and variance when choosing between β1  and β1 .

Q6. Wooldridge 2.10. (OLS algebra)

Let β0 and β1 be the OLS intercept and slope estimators, respectively, and letu(-) be the sample average of the errors (not the residuals!).

(i)  Show that β1 can be written as β1  = β1  + 1 wiui, where wi   =  (xi  )/SSTx  .

(ii) Use part (i), along with  

are being asked to show that E[(1   β1 )u(-)] = 0.]

(iii)Show that 0  can be written as 0   = β0  +u(-)  (1  − β1 ) .

(iv) Use parts (ii) and (iii) to show that Var(0) = σ 2 /n + σ 22 /SSTx .

(v) Do the algebra to simplify the expression in part (iv) to equation (2.58). [Hint: SSTxn = n  1  

Q7. Wooldridge 15.1. (endogeneity & IV)

Consider asimple model to estimate the effect of personal computer (PC) ownership on college grade point average (GPA) for graduating seniors at a large public university:

GPA = β0  + β1PC + u,

where PC is a binary variable indicating PC ownership.

(i)  Why might PC ownership be correlated with u?

(ii) Explain why PC is likely to be related to parents’ annual income. Does this mean parental income is a good IV for PC? Why or why not?

(iii)Suppose that, four years ago, the university gave grants to buy computers to roughly

one-half of the incoming students, and the students who received grants were randomly chosen. Carefully explain how you would use this information to construct an

instrumental variable for PC.

Q8. Wooldridge 15.3. (IV algebra)

Consider the simple regression model

y = β0  + β1x + u

and let z be abinary instrumental variable for x. Use (15.10) to show that the IV estimator can be written as

β1  = (y(-)1  y(-)0)/(1  0),

wherey(-)0 and 0 are the sample averages of yi  and xi  over the part of the sample with zi   =     0, andy(-)1  and 1  are the sample averages of yi  and xi  over the part of the sample with zi   =  1.

This estimator, known as a grouping estimator, was first suggested by Wald (1940).

(The above are selected from the end-of-chapter Problems and Computer Exercises.)

Computer Exercise

.    All data files are in the course website, suffixed with “ .dta” (STATA data file format). The data description is available by using STATA command “describe”.

.    Example STATA do-files, are also posted in the course website, suffixed with “ .do” .

.    To carryout computations for Q1-2 and Q4, you need to read the file Guide4STATA.pdf” if you are not already familiar with STATA.

.    If you want to access STATA via myAccess, please follow the instructions in the file “Stata_via_myAccess.pdf”.

Week 3 Tutorial Exercises (Instrumental Variables)

Readings

.    Read Chapter 15 thoroughly.

.    Make sure that you know the meanings of the Key Terms at the chapter end.

.    Answer the summary questions in the lecture slides.

Question Set

Q1. Wooldridge 15.8. (endogeneity & IV)

Suppose you want to test whether girls who attend a girls’ highschool do better in math   than girls who attend coed schools. You have a random sample of senior high schoolgirls from a state in the United States, and score is the score on a standardized math test. Let

girlhs be a dummy variable indicating whether a student attends a girls’ highschool.

(i)  What other factors would you control for in the equation? (You should be able to reasonably collect data on these factors.)

(ii) Write an equation relating score to girlhs and the other factors you listed in part (i).

(iii)Suppose that parental support and motivation are unmeasured factors in the error term in part (ii). Are these likely to be correlated with girlhs? Explain.

(iv) Discuss the assumptions needed for the number of girls’ high schools within a 20-mile radius of a girl’s home to be a valid IV for girlhs.

(v) Suppose that, when you estimate the reduced form for girlshs, you find that the

coefficient on numghs (the number of girls’ high schools within a 20-mile radius) is    negative and statistically significant. Would you feel comfortable proceeding with IV estimation where numghs is used as an IV for girlshs? Explain.

Q2. Wooldridge 15.C2. (FERTIL2.dta, FERTIL2-w3.do)

The data in FERTIL2 include, for women in Botswana during 1988, information on number of children, years of education, age, and religious and economic status variables.

(i)  Estimate the model

children = β0 + β1educ + β2age + β3age2  + u

by OLS and interpret the estimates. In particular, holding age fixed, what is the

estimated effect of another year of education on fertility? If 100 women receive another year of education, how many fewer children are they expected to have?

(ii) The variable frsthalf is a dummy variable equal to one if the woman was born during the first six months of the year. Assuming that frsthalf is uncorrelated with the error term

from part (i), show that frsthalf is a reasonable IV candidate foreduc. (Hint: You need to do a regression.)

(iii)Estimate the model from part (i) by using frsthalf as an IV for educ. Compare the estimated effect of education with the OLS estimate from part (i).

(iv) Add the binary variables electrictv, and bicycle to the model and assume these are

exogenous. Estimate the equation by OLS and 2SLS and compare the estimated

coefficients on educ. Interpret the coefficient on tv and explain why television ownership has a negative effect on fertility.

Q3. Wooldridge 15.C3. (CARD.dta, CARD-w3.do)

Use the data in CARD for this exercise.

(i)  The equation we estimated in Example 15.4 can be written as

log(wage) = β0  + β1educ + β2exper + … + u,

where the other explanatory variables are listed in Table 15.1. In order for IV to be consistent, the IV for educ, nearc4, must be uncorrelated with u. Could nearc4 be correlated with things in the error term, such as unobserved ability? Explain.

(ii) For a subsample of the men in the data set, an IQ score is available. Regress IQ on nearc4 to check whether average IQ scores vary by whether the man grew up near a four-year    college. What do you conclude?

(iii)Now, regress IQ on nearc4, smsa66, and the 1966 regional dummy variables reg662, …, reg669. Are IQ and nearc4 related after the geographic dummy variables have been partialled out? Reconcile this with your findings from part (ii).

(iv) From parts (ii) and (iii), what do you conclude about the importance of controlling for smsa66 and the 1966 regional dummies in the log(wage) equation?

Q4. Wooldridge 15.2. (endogeneity & IV)

Suppose that you wish to estimate the effect of class attendance on student performance, as in Example 6.3. A basic model is

stndfnl = β0 + β1atndrte + β2priGPA + β3ACT u,

where the variables are defined as in Chapter 6.

(i)  Let dist be the distance from the students’ living quarters to the lecture hall. Do you think dist is uncorrelated with u?

(ii) Assuming that dist and u are uncorrelated, what other assumption must dist satisfy to be a valid IV for atndrte?

(iii)Suppose, as in equation (6.18), we add the interaction term priGPA·atndrte:

stndfnl = β0 + β1atndrte + β2priGPA + β3ACT + β4PTiGPA · atndTte + u.

If atndrte is correlated with u, then, in general, so ispriGPA·atndrte. What might be a good IV for priGPA·atndrte? [Hint: If , as happens when priGPAACT, and dist are all    exogenous, then any function of priGPA and dist is uncorrelated with u.]

Q5. Wooldridge 15.11. (measurement error, time series, lags as IV)

Consider a simple time series model where the explanatory variable has classical measurement error:

y t  = β0  + β1x xt  = x et,

where ut has zero mean and is uncorrelated with x

Assume that et  has zero mean and is uncorrelated with x x (this last assumption is only to simplify the algebra).

(i)  Write x = xt   et  and plug this into (15.58). Show that the error term in the new

equation, say, vt, is negatively correlated with xt if β1  > 0. What does this imply about the OLS estimator of β1  from the regression of y t  on xt?

(ii) In addition to the previous assumptions, assume that ut  and et  are uncorrelated with all past values of x et; in particular, with x− 1  and et − 1. Show that E(xt − 1vt) = 0

where vt  is the error term in the model from part (i).

(iii)Are xt  and xt  1 likely to be correlated? Explain.

(iv) What do parts (ii) and (iii) suggest as a useful strategy for consistently estimating β0  and β1 ?

Q6. Wooldridge 15.C8. (401KSUBS.dta, 401KSUBS -w3.do)

Use the data in 401KSUBS for this exercise. The equation of interest is a linear probability model:

pira = β0  + β1p401k + β2inc + β3inc2  + β4age + β5age2  + u.

The goal is to test whether there is a tradeoff between participating in a 401(k) plan and having an individual retirement account (IRA). Therefore, we want to estimate β1 .

(i)  Estimate the equation by OLS and discuss the estimated effect of p401k.

(ii) For the purposes of estimating the ceteris paribus tradeoff between participation in two different types of retirement savings plans, what might be a problem with ordinary least squares?

(iii)The variable e401kisa binary variable equal to one if a worker is eligible to participate in a 401(k) plan. Explain what is required for e401k to be avalid IV for p401k. Do these assumptions seem reasonable?

(iv) Estimate the reduced form for p401k and verify that e401khas significant partial

correlation with p401k. Since the reduced form is also a linear probability model, use a heteroskedasticity-robust standard error.

(v) Now, estimate the structural equation by IV and compare the estimate of 1  with the OLS estimate. Again, you should obtain heteroskedasticity-robust standard errors.

(vi) Test the null hypothesis that p401k is in fact exogenous, using a heteroskedasticity- robust test.

Q7. Wooldridge 15.C10. (HTV.dta, HTV-w3.do)

Use the data in HTV for this exercise.

(i)  Run a simple OLS regression of  log(wage) on educ. Without controlling for other factors, what is the 95% confidence interval for the return to another year of education?

(ii) The variable ctuit, in thousands of dollars, is the change in college tuition facing students from age 17 to age 18. Show that educ and ctuit are essentially uncorrelated. What does   this say about ctuit as a possible IV for educ in a simple regression analysis?

(iii)Now, add to the simple regression model in part (i) a quadratic in experience and a full set of regional dummy variables for current residence and residence at age 18. Also

include the urban indicators for current and age 18 residences. What is the estimated return to a year of education?

(iv) Again using ctuit as a potential IV for educ, estimate the reduced form for educ.

[Naturally, the reduced form for educ now includes the explanatory variables in part (iii).] Show that ctuit is now statistically significant in the reduced form for educ.

(v) Estimate the model from part (iii) by IV, using ctuit as an IV for educ. How does the

confidence interval for the return to education compare with the OLS CI from part (iii)?

(vi) Do you think the IV procedure from part (v) is convincing?