Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MIEF Quantitative Methods I (Basic Econometrics)

Practice Final Exam (Exam date:  August 27, 2019)

Instructions.  This is an open book, open note exam.  Please refer to any books or notes on a computer or notebook that you wish.  Please do watch your time and write as legibly as possible.    For any calculations you are asked to carry out, please use 3 to 4 significant figures (not more, not less).    

1.  Data and question 1

What follows is hypothetical (ie, made up) data on 25 SAIS students who graduated last year and is included at Appendix 0.  The variables are:

SAISGPA Final cumulative grade point average (4 point scale)

Econ 1 if an economics major as an undergraduate, 0 otherwise

IR 1 if an international relations major as an undergraduate, 0 otherwise

Math_Science 1 if a math or science major as an undergraduate, 0 otherwise

ColGPA College grade point average (4 point scale)

Jogger 1 if the SAIS student is a jogger, 0 if not a jogger

Internship Number of hours per week on average spent on an internship over the time at SAIS

Appendix 1 uses Stata and sets out to explain SAISGPA in terms whether the SAIS student was a jogger, several variables that indicate the undergraduate major, College GPA, and time spent on internships

a) For the first function estimated, write out the model including the error term.  What assumptions were made when this regression was carried out?

b) Put in words each of the slope coefficients on the independent variables; start with a clear description of the constant.  Explain whether the slope coefficients have the sign that you might expect.  

c) Based on the p-value computed by Stata, which of these variables are statistically significantly different from zero at the .05 level of significance.  Be clear which numbers you are comparing. What does it mean for a coefficient in multiple regression to be significantly different from zero?

d) Put in words (ie, describe clearly) what the p-value for the IR variable tells us.

e) Now the second regression estimated drops the undergraduate major variables.  Compute the F statistic to test the null hypothesis that the three undergraduate major variables as a group to do not add anything extra to the explanation of the variability of SAISGPA.   Please compute the F statistic both using the R Squared version and the SSR version.  The two numbers should be close.  Carry out that F test at the .05 level of significance, being clear what degrees of freedom you are using.

2.   Appendix 2 contains an analysis that utilized a variable which is the product of Age and Height, the two independent variables.  This is a sample of eight young boys aged from 6 to 12.  Height is measured in inches and weight is measured in pounds.

a) Using that regression that contains that product variable (the second regression), write out the relationship that gives the change in Weight as Age changes, holding constant Height, ie, the slope in the Age direction holding constant Height.  Compute the value of the slope for a child with a Height of 40, 50 and 60 inches.

b) What is the standard error of the slope on Age when Age=9 and Height=50?

3.  Question on Heteroskedasticity

Appendix 3 contains an analysis of hypothetical data for a sample of 33 households where data on savings, income, and family size was obtained.  Page 1 lists the data.

a) State in symbols and words the homoscedasticity assumption.  

b) Regression 1 on page 2 sets out to estimate savings as a function of income and family size.  Two regressions are run on page 4 testing whether the assumption discussion in (a) was violated.   State clearly the names of these two tests and the assumptions involved.  From Regression #2, can you tell which variable is causing the problem?  What is your conclusion from these two tests?

c) Using Regression #1 and Regression #4 (page 5), write down the function estimated with the standard errors written appropriately below each slope coefficient.   Describe briefly why the standard errors from Regression #4 are more appropriate than the standard errors reported in Regression #1.

d) Discuss Regression #5 on page 6.  What are the assumptions made with this regression.

e) Discuss Regression #6 on page 7.  How is that related to Regression #5.

f) On page 7 there is also a scatter plot that was produced following Regression #1.  Describe briefly what information you see in this scatter

g) On page 8 there are two functions estimated, Regression #7 and Regression #8.  Explain the model being estimated here.  How is this different from Regression #5

h) Explain the connection between Regression #9 is on page 9 and Regression #8 on page 8.

4.  Pooled Cross Section (plus review of dummy variables; quadratics; and logs)

The analysis in Appendix 4 using hypothetical (made up) data on SAIS students drawn from a sample in 2008 and another sample in 2015.  The data is listed on pp 1-4 and are the same as question 1 with fellowship=1 if a student has a fellowship.

a. Regression #1 on Page 4 sets out to explain SAISGPA in terms of undergraduate GPA (ColGPA), time spent studying (StudyHrs), time spent on internships (InternshipHrs), and a special fellowship that was enhanced in 2012.  

1) Write out the function estimated with standard errors and t-values in parentheses.  

2) Put in words the constant and each of the coefficients on Y2015, Fellowship, and Y15Fellowship.

3) Put in words the coefficients on StudyHrs and StudyHrsSq.  What is the impact on SAISGPA when StudyHrs=20 and when StudyHrs=40.

4) Put in words the R squared.

5) What is the standard error of estimate.  What is it an estimate of?

6) Stata produces a 95 percent confidence interval for the population coefficient on ColGPA.  Compute an 80 percent confidence interval.  Explain why it is narrower or wider.

7) What hypothesis is being tested with the F statistic displayed.  Show the hypothesis test with a picture and critical F value.

b.   On page 5 are two regressions run for the separate time periods.  Also on page 7 is the same regression run with the pooled data.   Test using the SSR form of the F test whether there was a structural change in the function between the two time periods.  What is this test called?

c. On page 6 (Regression #4) and  page 7 (Regression #5) are regressions that can be used  to test if there was a structural change in the function between the two time periods.  Use the R squared form of this F test.  What is this test called?  Compare the result with that from part b.

d. Use Regression #5 on page 7 and Regression #6 on page 7 to test if undergraduate major is significant in determining SAISGPA.   

e. Regression #8 on page 8 and Regression #9 on page 9 examine a policy change for SAIS students with fellowships (this is apocryphal) in that they are offered a special quiet place to study to see if it would improve their grades.  Develop a Differences of Differences analysis for these two outputs.  Show that you obtain the same results from Regression #10 on page 9.  It is important that a difference –in- differences (the average treatment effect) is statistically significant.  Test if the average treatment effect is statistically significant.  Please show a picture of the distribution, and the critical t-value for α=.01.  Is your conclusion consistent with the p-value displayed?  Put in words what this p-value tells you.

f. Regression #11 (page 10) changes the functional form by having the dependent variable (SAISGDP) logged as well as ColGDP.  State clearly in words the meaning of the slope coefficient on ColGDP.  State clearly in words the meaning the slope coefficient on InternshipHrs.  State clearly the meaning of the slope coefficient on Fellowship (compute in two different ways).

g. The calculations on page 11 are used to compare Regression #11 with Regression #5.  Discusss this comparison with some mention of the sample correlation between a dependent variable and the predicted value of the dependent variable.

h. Regression #12 on page 12 is designed to examine the multicollinearity issues in Regression #5.  Compute the VIF (Variance Inflation Factor) for the StudyHrs Squared.  What impact do you think this VIF has on the standard errors (and therefore the t-values) for the StudyHrs and StudyHrsSq variables in Regression #5 on page.7.

5.   Appendix 5 provides an analysis of two period panel data.

a. The data is listed on page 2 and 3.  Describe how the data is included in the data set.

b. The first regression on page 1 treats the data as pooled cross section data.  Why would you think that the slope coefficient on unemployment rate (to explain the dependent variable crime rate) is statistically insignificant?

c. The second regression on page 1 takes the first difference of the two variables and we do end up with a statistically significant slope coefficient on the unemployment rate (change in the unemployment rate).  Discuss this in terms of how the fixed effects are handled.

6.   Appendix 6 uses data that is not seasonally adjusted.  The variable Women is the number of female wage and salary workers, 25 years old and over, who work part-time.  The variable Men is the number of male wage and salary workers, 25 years old and over, who work part-time.  The data is listed on Page 1 and top of Page 2

a) Write out Regression 1 in the usual fashion with standard errors and t statistics in parentheses.  Put in words the constant and each of the slope coefficients.  

b) Briefly compare the difference between Regression 2 and Regression 1 in terms of the values of the coefficients.

c) In Regression 3 and Regression 4 on Page 3, state clearly the meaning of the coefficient on Time and Q4.

d) On Page 4 and top of Page 5 are four regressions carried out.  Discuss the R Squared value.  What would happen if you tried to “detrend” Q1?

e) Regression 9 is found on Page 5 with a follow up regression at the bottom of the page.  Discuss how the regression at the bottom of the page is related to Regression 1 on Page 2.

7. Appendix 7 contains some U.S. National Income data quarterly seasonally adjusted (billions of U.S. $) from 2005 to early 2016.  The regressions on Page 2 set out to explain consumption of durable goods in terms of GDP and either investment in fixed assets or investment in residential structures.

a) For Regression 1 on Page 2, put in words the coefficient on GDP

b) Draw a diagram of the lag distribution for Regression 1.  Put into words each of these coefficients.  Briefly state if this is what you expected.

c) For Regression 2 on Page 2, answer the same questions as in a) and b) above.

8.  Appendix 8 uses the sample of eight young boys ages 6 to 12 with Height measured in inches and weight measured in pounds.  The regression at the top of Page 1 (not identified with a number) is the basic regression that sets out to explain the Weight of these children as a function of the Age and Height.

Regression 1 toward the bottom of Page 1 uses several additional variables.  What is being tested with this function?  What is the name of this test?  What is the conclusion?

Regression 2 on Page 2 uses a different functional form and then predicts based on that new functional form.  Regression 3 at the bottom of Page 2 uses that prediction as an additional independent variable.  What is being tested here?  What is the name of the test?  What is the conclusion?

Discuss Regression 4 on Page 3 and the tests that follow.  What is the name of this test?  What are the conclusions?

9   Appendix 9 lists the data now for 9 children except that Child 9 appears not to be a child

a) Regression 1 runs the regression that sets out to predict Weight as a function of Age and Height.  It generated two residuals.  Explain each of the residuals.

b) Regression 2 on Page 2 runs the regression again adding the variable Outlier.  Briefly describe the Outlier variable.  The Outlier variable has a constant, standard error, and t value.  Indicate where two of these numbers show up elsewhere on Appendix 9.

c) Regression 3 on the bottom of Page 2 and the calculations on Page 3 work to produce the final number on Page 3.  Discuss briefly what is being carried out with the regression and the calculations.