Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ECON-120B-Fall-2022-HW3

November 30, 2022

0.0.1   Instruction

Preface   This assignment is an individual assignment. You may ask the Professor/TAs for some guidance and help, but you can’t copy code. You may discuss the assignment conceptually with your classmates, including bugs that you ran into and how you fixed them. However, do not look at or copy code. This constitutes an Academic Integrity Violation.

Getting Started

• 1 Go to your ECON120B-FA22. In this directory, create a directory named hw3. Put the starter- code and the dataset into hw3.

• 2 In the starter-code hw3 .do, write down your Name and PID on the provided space. These two lines will serve as your signature for an Academic Integrity Pledge (you acknowledge the terms in the Preface section). Failure to provide these two lines will result in automatic 0 on this assignment.

• 3 In the starter-code hw3 .do, fill in your filepath (review Stata session 2 if needed).

• 4 Start coding!

Submitting   Starting this quarter, we will use Gradescope Autograder to perform similarity check for cheating detection. You will need to submit your files to Gradescope. The files to submit includes [hw3.do, hw3 .log]. You will have unlimited attempts to submit your homework but only the last submission will count towards your final grade.

0.0.2   Problems

For all of the stata built-in commands you use, please use the FULL NAME of those com- mands(e.g. use display instead of dis).

For all OLS’, unless specified, use the robust option.

For all questions related to numerical values, please provide exact values unless specified to “round”.

DO NOT EDIT the comments in the starter code!

• Q1.   The dataset MLB1.dta contains statistics for 353 baseball players, excluding pitch- ers.  The salary data were obtained from the New York Times, April 11, 1993, and refer to the 1993 season. The salary data is in dollars. The baseball statistics are from The Baseball Encyclopedia, 9th edition, and the city population figures are from the Statistical Abstract of the United States. Let’s start by estimating the salary of the players on years (years in major leagues), gamesyr (number of games per year in the league) and a set of binary variables that determine the position players play: frstbase (first base), scndbase (second base), thrd- base (third base), shrtstop (shortstop), catcher and outfield.

 Task (fill in the blank): When you include all six dummy variables for player types, STATA is not able to estimate all the coefficients due to ____. Assign your choice to q1 (if your answer is A, simply do scalar  q1  =  "A").

* A. Imperfect Multicolinearity

* B. Perfect Multicolinearity

* C. Omitted Variable Bias

* D. Heteroskedasticity

• Q2. Now reestimate the regression on question 1, but exclude the dummy variable for sec- ond base. Keeping the experience variables constant (years and gamesyr), what is the esti- mated difference in average salary between a catcher and a second base player? Save your answer in thousands of dollars to q2.

• Q3.  Use the results from the regression in question 2 (the one where you excluded the dummy variable for second base). What is the estimated absolute difference in the average salary between a third base and a shortstop player, assuming they have the same experience (same years in the league and same games per year)?  Save your answer in thousands of dollars to q3.

• Q4.  Test the hypothesis that shortstop and third base players earn, on average, the same amount, keeping the other factors constant.

 Task (complete the sentence): The F-statistic for this test is equal to q4a, which is q4b ∈ {“higher”, “lower”} than the appropriate critical value of q4c (round to two decimal places, and assume a 5 percent significance level). Therefore, we q4d ∈ {“reject”,“fail to reject”} the null hypothesis that shortstop and third base players earn the same amount, on average, keeping all else constant.

• Q5. Use the results from the regression in question 2. What is the estimated average salary (in thousands of dollars) for an outfield player who has been 17 years in the league and has played on average 100 games per year? Assign your answer to q5.

• Q6. The actual salary of the great Tony Gwynn (a San Diego Padres player) in 1993 was 4,333 thousands of dollars. Tony Gwynn was an outfielder, with 12 years in the league in 1993, whose number of games per year were on average 132. Assuming that Tony Gwynn is one of the players in the dataset, what is his OLS residual? Save your answer in thousands of dollars to q6.

• Q7. In lecture, we showed you the results from the regression of lsalary (the log of the vari- able salary) on years, gamesyr, and three measures of player performance: bavg (career bat- ting average), hrunsyr (home runs per year), and rbisyr (runs batted in per year). Run that same regression.  Note that none of the coefficients of the performance variables are indi- vidually statistically significant. Compute the correlation coefficient between performance variables.

 Task (complete the sentence): The correlation coefficient between rbisyr and hrunsyr is q7a. These two performance statistics are q7b ∈ {“highly“,”not that highly“} correlated, which suggests a q7c ∈ (”perfect“,”imperfect"} multicolinearity problem.

• Q8. Continue with the model in question 7., but now drop the variable rbisyr. What happens to the statistical significance of the coefficient on hrunsyr?

 Task (complete the sentence: The coefficient q7a ∈ {“becomes”,“remains”} statistically q7b ∈ {“significant”, “insignificant”}, at a 5 percent significance level.  The estimated coefficient on hrunsyr has increased by about q7c times.

• Q9.   Add the variables runsyr (runs per year), fldperc (fielding percentage) and sbas- esyr (stolen bases per year) to the model in question 8.  Which coefficients on these three new variables are individually statistically significant? (Use a 5 percent significance level). Assign your answer to q9. (if your answer is A, simply do scalar  q9  =  "A").

 A. Only the coefficient on sbasesyr is statistically significant.

 B. None of the coefficients on these three new variables are statistically significant.

 C. Only the coefficient on runsyr is statistically significant.

 D. The coefficients on runsyr and sbasesyr are statistically significant.

• Q10. Look at the results of the regression in question 9. How do you interpret the estimated coefficients on runsyr? Assign your answer to q10. (if your answer is A, simply do scalar q10  =  "A").

 A. Holding all else constant, an additional run per year is associated with a 1.74 percent increase in salary, on average.

 B. Holding all else constant, an additional run per year is associated with a 0.0174 per- cent increase in salary, on average.

 C. Holding all else constant, an additional run per year is associated with about 17.4 dollars (=0.0174 thousand dollars) increase in salary, on average.

 D. Holding all else constant, a 1 percent increase in the runs per year is associated with a 0.0174 percent increase in salary, on average.

• Q11.  In the model from question 9, test the joint significance of the coefficients on bavg, fldperc and sbasesyr. What do you conclude?

 Task (complete the sentence):The F-statistic is equal to q11a, which is q11b ∈ {“higher”, “lower”} than the appropriate critical value of q11c (round to two decimal places, and assume a 5 percent significance level). Therefore, we q11d ∈ {“reject”,“fail to reject”} the  joint hypothesis that the three coefficients are zero.

• Q12. Now, we want to check whether race plays a role in determining salaries. With that in mind estimate a regression of lsalary on the experience variables years, and gamesyr, on the performance variables bavg, hrunsyr, rbisyr, runsyr, fldperc, allstar, and in the race vari- ables black and hispan. Assume that the base group (the excluded group) is white players. Test the joint significance of the coefficients on black and hispan. What do you conclude?

 Task (complete the sentence):The F-statistic is equal to q12a, which is q12b ∈ {“higher”, “lower”} than the appropriate critical value of q12c (round to two decimal places, and assume a 5 percent significance level). Therefore, we q12d ∈ {“reject”,“fail to reject”} the  joint hypothesis that the two coefficients are zero.

• Q13.  In your dataset you have data on the racial composition of cities.  The variable per- cblck is the percentage of African Americans in the team’s city population, and perchisp is the percentage of Hispanic population in the team’s city. Add to the regression in question

12 two interaction variables: blckpb (= black x percblck) and hispph (= hispan x perchisp). You can check that the coefficients on the four race variables are jointly statistically signifi- cant.

 Task (complete the sentence): When an African American player lives in a city that has no African American population (percblck = 0), then our model predicts that on aver- age, an African American player earns about q13a ∈  [0, 100] percent q13b ∈ {“less”, “more”} than a comparable white player.   Our model predicts that on average, an African American player who lives in a city with a 20% African American popula-  tion (percblck = 20) earns about q13c ∈ [0, 100] percent q13d ∈ {“less”, “more”} than a comparable white player.

• Q14. Use the results from the model of question 13. What is the value of perchisp that makes the salary differential between a Hispanic player and a comparable white player equal to zero? Save your answer to q14 ∈ [0, 100].

• Q15. Assign your name, email, and PID in string to q15a, q15b, and q15c respectively.