Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Take-home assignment #4

Econometrics

Spring 2022

1) Suppose you want to estimate the effect of GDP, population size, and political structure on a country’s success at the 1988 Olympic Games in Seoul.  Specifically, you want to find out whether a country’s GDP per capita has a larger effect on medals won for communist countries than for non-communist countries.  The variables are defined here:

total_medals =  total count of medals won by country

pop_millions =  country’s population size (in millions)

gdp_per_cap =  per capita GDP, in 1000s of 1988 US dollars

communist =  1 if the country was led by a communist party in 1988 (and 0 if not)

communist_gdp =  interaction of gdp_per_cap and communist

The results of the OLS regression are:

 

(a) What is the interpretation of the coefficient on pop_millions?

(b) What is the p-value on the coefficient for gdp_per_cap?

(c) What is the estimated effect of gdp_per_cap on total_medals for non-communist countries?

(d) What is the estimated effect of gdp_per_cap on total_medals for communist countries?

(e) Is the effect on (d) statistically different from the effect on (c)?

(f) How many “stars,” if any, should the coefficient estimate on “communist” receive?  Please explain.

2) (The monetary returns to education and the gender gap)  What are the determinants of a person’s earnings?  How do earnings depend on education attainment?  Does gender play a role in determining earnings?  The following empirical exercise addresses these questions.

To gain some familiarity with the topic, please read the following three textboxes from the [SW = Stock/Watson] textbook:

· “The Gender Gap of Earnings of College Graduates in the United States”

· “The Economic Value of a Year of Education:  Homoskedasticity or Heteroskedasticity?”

· “The Return to Education and the Gender Gap”

The dataset “CPS_2008.xls” contains 4,733 person-level observations on hourly wage rate, years of education, age of a person, and other variables from the 2008 Current Population Survey (CPS).  The dataset is posted on the course website.  For a description of the CPS, see Appendix 3.1 of the [SW] textbook.  There is a “data dictionary” (posted on the course website) that accompanies the dataset.

Import data and create a Stata dataset

(a) Import the data into a statistical software (other than Excel), properly label the variables, and save the data in your working directory and in the format of the software that you are using.

Descriptive statistics and preliminary regression

(b) Use the statistical software to calculate a table of descriptive statistics that effectively summarize all the variables.  Then, export these descriptive statistics from the software to a Word document, and then display them in a nicely formatted table.  Please do not just copy and paste information from the software into Word – the goal is that you learn how to automate the process of producing publication-quality descriptive statistics tables in Word from the output of your statistical software.

Provide brief, to-the-point, and meaningful comments on your results.

(c) Run a regression of wage on educ, and plot the fitted regression line along the data scatter for wage vs. educ (please have educ on the horizontal axis, and wage on the vertical axis).  Carefully interpret the meaning of the coefficient estimates and the overall regression results.  Is educ statistically significant at the 5% level?  Additionally, based on the table of descriptive statistics that you built above, is the estimated value that you obtained for the coefficient on educ of practical significance?  Explain.

Log regressions, omitted variable bias, and interaction variables

(d) Please construct a single graph that contains four scatter plots within it:  wage vs. educ; wage vs. ln_educ; ln_wage vs. educ; ln_wage vs. ln_educ.  You may find some of these scatter plots useful as you answer the next few questions.  (For example, if you are using Stata, the command “graph matrix” can produce such a single graph.)

(e) Estimate the following five models, and then display side-by-side the results of these five regressions in a single table of regression results.  (For example, if you are using Stata, the commands “eststo” and “esttab” may accomplish this goal.)  This side-by-side table should display the same set of statistics and the same specification for the “stars” as the table in the previous take-home assignment.

(Note:  “female_educ” is an “interaction” variable that you need to create by multiplying female by educ.)

· Model 1:  Regress ln_wage on ln_educ

· Model 2:  Regress ln_wage on educ

· Model 3:  Regress ln_wage on educ and exper.

· Model 4:  Regress ln_wage on educ, exper, and female

· Model 5:  Regress ln_wage on educ, exper, female, and female_educ

(f) In model 1, what is the interpretation of the coefficient estimate on ln_educ?  In model 2, what is the interpretation of the coefficient estimate on educ?  Besides being statistically significant (or not), are the estimates of the slope coefficients in each of these two models of practical relevance?  Explain.

(g) Comparing model 1 to model 2, which one provides a better “fit” for the data?  Explain.

(h) Interpret the results of model 3.  By comparing model 2 to model 3, does model 2 suffer from omitted variable bias?  Explain your answer in terms of the two conditions that need to be satisfied if omitted variable bias is to be present.  Also provide an intuitive/economic explanation for why there is (or there is not) omitted variable bias.

(Note:  As you provide an answer to this part, it may be useful to additionally think of the following question:  If there is omitted variable bias in model 2, is this bias “practically” relevant?)

(i) For model 4, interpret the coefficient estimate on female and discuss the overall results.

(j) By comparing model 3 to model 4, note that the coefficient on female in model 4 is statistically significant, but the coefficient estimate on educ changed very little from model 3 to model 4.  Why do you think this is the case?  If you were interested in understanding only the causal effect of education on earnings, without regards to potential gender discrimination, would model 3 be enough to address your question?

(Note:  Relate your answer to the notion of potential omitted variable bias, and the two conditions that need to simultaneously hold in order for omitted variable bias to exist.)

(k) In model 5, carefully interpret the meaning of the coefficient estimates on educ, female, and, in particular, the coefficient estimate on female_educ.  Additionally, discuss the overall results in terms of the “gender gap” and in terms of “returns to education.”

F-test, categorical dummies

(l) Next, extend model 5 from part (g) by incorporating geographic region dummies that capture the U.S. region where a person lives.  Run this regression and carefully interpret the coefficient estimates on the geographical region dummies.

(m) Next, test the null hypothesis that geographic regions jointly do not matter for earnings.  Do you reject or fail to reject the null hypothesis at the 5% significance level?  Based on Table 4 on the back of the [SW] textbook, what is the 5% critical value for the F-test you conducted?  How about the 1% critical value?