Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ECON7310: Elements of Econometrics

Final Problem Set

November 14, 2022

Instruction

Answer all questions following a similar format of the answers to your tutorial questions. When  you use R to conduct empirical analysis, you should show your R script(s) and outputs (e.g.,  screenshots for commands, tables, and figures, etc.). You will lose 2 points whenever you fail to provide R commands and outputs. When you are asked to explain or discuss something, your  response should be brief and compact.  To facilitate tutors’ grading work, please clearly label  all your answers. You should upload your answers (in PDF or Word format) via the “Turnitin” submission link (in the “Final Problem Set” folder under “Assessment”) by 11:59 AM on the  due date November 17, 2022.  Do not hand in a hard copy.  You are allowed to work on this assignment in groups; that is, you can discuss how to answer these questions with your group  members. However, this is not a group assignment, which means that you must answer all the questions in your own words and submit your report separately. The marking system will check  the similarity, and UQ’s student integrity and misconduct policies on plagiarism apply.

1.  Panel Data Regression (20 points)

You investigate the deterrent effects of execution on murder using the panel dataset murder .csv, which includes the (U.S.) state-level data on murder rates and executions.

(a)  (4 points) Consider the following model with unobserved effects:

mrdrteit = λt+ β1 execit+ β2unemit + αi + uit ,                             (1)

where execit  denotes the number of past executions of state i by year t, mrdrteit  and unemit denote the murder and unemployment rate of state i in year t, respectively. Which factor in the model (1) does represent unobserved state fixed effect (1 point)? Year fixed effect (1 point)?  If past executions of convicted murderers have a deterrent effect, what should be the sign of β 1  (1 point)? What sign do you think β2  should have (1 point)?

(b)  (8 points) Using the data for all three years (1987, 1990, and 1993), estimate the equation

(1) by OLS and report estimation results (2 points).1   Compute cluster-robust standard errors (SE). Why not just simply compute the heteroskedasticity-robust SE (2 points)? How many clusters” do you have in the data (2 points)?  Do you find any evidence for deterrent effects (2 points)? Hint: Use time dummies to estimate time effects.

(c)  (8 points) Now, using the data for all three years, estimate the equation (1) by fixed effects (FE) regression and report estimation results (2 points).  Again compute cluster-robust SE. Is there any evidence of deterrent effects (2 points)?  Is there any evidence of time effects (2 points)? Compare the estimation results obtained in (b) and (c). Comment on your findings (2 points).

2.  Binary Choice Models (20 points)

You want to study female labor force participation using a sample of 872 women from Switzer- land (swiss .csv). The dependent variable is participation (=1 if in labor force), which you regress on all further variables plus age squared; i.e., on income, education (years of schooling), age, age2 , numbers of younger and older children (youngkids and oldkids), and on the factor foreign, which indicates citizenship (=1 if not Swiss).

(a)  (7 points) Run this regression using a linear probability model (LPM) and report the

regression results (3 points). Test if age is a statistically significant determinant of female labor force participation (2 points). Is there evidence of a nonlinear effect of age on the probability of being employed (2 points)?

(b)  (7 points) Repeat (a) using a probit and logit regression model and report your results.2

(c)  (6 points) Use the models of LPM and probit to compute the predicted probability of being in the labor force for a Swiss female with median income and age of the sample, 12 years of schooling, one young kid, and no old kid.

3. IV Regression (40 points)3

You use the following regression model and dataset cigbwght .csv to estimate the effects of several variables, including cigarette smoking, on the weight of newborns:

log(bwght) = β0 + β1male + β2parity + β3 log(faminc) + β4 smoke + u,              (2)

where male is a dummy variable equal to 1 if the child is male; parity is the birth order of this child; faminc is family income (in $1000); and smoke is a dummy variable equal to 1 if the mother smoked during pregnancy.

(a)  (4 points) Obtain OLS estimates of the regression equation  (2) and report regression results.

(b)  (5 points) Interpret the estimated coefficient on smoke (3 points) and test whether the

population coefficient β4  is zero at the 1% significance level (2 points).

(c)  (10 points) Some studies suggest that smoking during pregnancy may have different im- pacts on male and female babies. Modify the specification of the regression model (2) and test this hypothesis (5 points).  In your modified model, does smoke still has significant (at 5% level) effects on the weight of newborns (2 points)? Explain your answer using test results (3 points). Hint: You don’t need to report regression results here, but writing out your modified regression model may be helpful.

(d)  (7 points) One of your classmates expresses her concern about the validity of your re- gression analysis and argues that there may be unobserved health factors correlated with smoking behavior that affect infant birth weight. For example, women who smoke during pregnancy may, on average, drink more coffee or alcohol, or eat less nutritious meals. If this is the case, do you think the OLS estimates you obtained in (a) are unbiased (con- sistent) (2 points)?  Explain your answer (3 points).  Is this a threat to your regression analysis’s internal or external validity (2 points)?

(e)  (4 points) You classmate then propose to use cigarette tax (cigtax) in each woman’s state

of residence as an instrumental variable (IV) for smoke and run a two-stage least squares (TSLS) regression.  Take her suggestion and report your TSLS regression results.  Hint: Use the model (2) for your TSLS regression.  You can use the iv robust() function in the estimatr package to run TSLS regression and calculate robust SE.

(f)  (10 points) Are coefficients of model (2) exactly identified, overidentified, or underidenti-

fied (2 points)?  Does this TSLS regression suffer from the weak IV problem (2 points)? Why or why not (2 points)? Is it possible to test the exogeneity of cigtax as an IV for smoke (2 points)? Explain your answer (2 points).

4.  Time Series (20 points)

The data file sp500d .csv contains daily data on S&P 500 index from 2000–2015.  S&P 500 is usually regarded as a gauge of the large cap U.S. equities market.  Here you use this data to examine the historical returns” of investing in the S&P 500.

(a)  (5 points) Let Yt  denote the S&P 500 index. Draw a time series plot of Yt  (2 points). Do

you think Yt  is a stationary time series (1 point)? Are yt  = log(Yt) and yt  = yt − yt 1 stationary (2 points)?4

(b)  (3 points) Use OLS to estimate the following AR(1) model5

∆yt = θ0 + θ1 ∆yt 1 + et

Does the AR(1) model fit the data well (1 point)? Explain your answer (2 points).

(c)  (8 points) For the following AR(p) model

∆yt = θ0 + θ1 ∆yt 1 + ··· + θp∆yt p + et ,

try p = 1, 2, 3, 4.  You want to select the optimal number of lags (i.e., p) using AIC and BIC as criteria. Which model do you think is the best (1 point)? Justify your choice (3 points).  For the AR(p) model you select, compute the first five autocorrelations of the regression residuals, t  (2 points). Do you think the errors et  of your selected model are serially correlated (2 points)? Hint: You can calculate t  manually using its definition or alternatively apply the resid() function.

(d)  (4 points) Let T denote the last trading day in the data. Forecast yT+1  and yT+2 .