闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

PROBLEM SET 3: REGRESSION

PREPARE STATA DO/LOG FILES AND ANSWERS TO QUESTIONS IN SEPARATE DOCUMENTS.

TURN IN THESE 3 DOCUMENTS (a Stata Do ﬁle, a Stata log ﬁle -should NOT be saved as a text ﬁle-, and a PDF ﬁle for your answers) via CANVAS.

1. Prof. Saygin and her students disagree on the eﬃcacy of longer deadlines on the probability of completion and the performance on the problem sets. Her students always ask for longer deadlines arguing that having more time to complete the problem set increases their chance to ﬁnish on time and to obtain a higher grade. Prof. Saygin worries about procrastination as a longer deadline often means that students postpone working on it over and over until eventually they hit the last minute. You, a benevolent

social planner, are collecting data from all UF Econ and Business professors for each course they teach to weigh in on this discussion. For each course i, you observe the average time given for an average homework, length of the homework, and average grades from problem sets. Each professor has their own pedagogical strategy for homework and grading while they also diﬀer in terms of courses they teach and their teaching style. Your job is to determine whether giving longer deadlines increases the chances of homework completion. Using this data, you have produced the following three regressions:

gradeij = − 178.3 + 2.42 deadlineij

gradeij = −236.4 + 1.27 deadlineij + 1.25 lengthij

lengthij = 123.9 + 0.92 deadlineij

where gradeij is the average problem set grades in course i taught by professor j , deadlineij is average time given until the deadline for problem sets in course i taught by professor j , and lengthij is the average length of the problem sets in course i taught by professor j .

(a) Why is the coeﬃcient on deadline diﬀerent in the ﬁrst two regressions? Show how the coeﬃcient in the second regression relates to the one in the ﬁrst using the results in these regressions? (b) Assuming the coeﬃcients are all statistically signiﬁcant, how do you interpret these results on the

eﬀect of deadlines on grades? Is either of the regressions likely to provide a good indication of the causal

eﬀect of deadlines on grades? Why or why not? Give an example for the sources of selection bias and the direction of the bias, if any. (Hint: Keep in mind that Prof. Saygin is probably right about her argument about long deadlines leading to procrastination. Because she is often right :))

2. Eﬀect of sales rep training on sales:

(a) Suppose HR specialists at ﬁrm A uses an algorithm to assign the sales reps of the company to receive some mentoring and training treatment Ti solely on the basis of three factors: prior experience (exp), tenure at the company, and sales in the previous month. Can you estimate the following

regression equation to get the causal eﬀect of treatment on the outcome Yi (sales by sales rep i in 100k after the training assignment)? Why or why not?

Yi = α + ρTi + β1 expi + β2 (Tenure)i + β3 (PrevSales)i + ei

(b) HR specialists at ﬁrm B takes a diﬀerent approach. Overall productivity of a sales rep is measured based multiple factors, which are collected in a score Si . Sales reps with higher scores are more productive, and they are less likely to receive the oﬀer for training and mentoring. But HR managers decides to oﬀer training somewhat from worker to worker (e.g. HR managers use varying thresholds to assign them to training). In order to understand the training assignment better, a researcher estimates the following equation where standard errors are reported in the parenthesis:

Ti = 0(0)1(.)1(4)3(5) − ..0(0)0(3)9(5)Si − ..0(0)1(3)6(5)(Female)i − ..0(0)2(0)8(4)(exp)i + ..0(0)3(0)9(3)(exp)i(2)/100 + ei

The researcher then goes on to estimate:

Yi = ..2(0)2(0)5(6) + ..5(2)7(1)2(1)Ti + ..1(7)6(9)5(8)Si + ..1(2)2(6)1(2)(Female)i + ..1(7)9(6)1(6)(exp)i − ..1(4)3(8)8(1)(exp)/100 + ei

where Yi is sales (in 100k) by sale rep i in the following month after the training assignments.

i. Interpret all coeﬃcients obtained in both regression equations. (e.g both in terms of statistical signiﬁcance and their magnitude)

ii. What is the rationale for including the regressor for female in the sales outcome equation? (i.e.

do you think it helps eliminate the OVB?)

iii. What is the rationale for including the regressors for experience and experience square in the sales outcome equation? (i.e. do you think it helps eliminate the OVB?)

(c) Another researcher is worried that the outcome regression (Yi ) in (b) may not identify the causal eﬀect of training treatment. That researcher notices a variable which measures the number of new customers the sales rep made agreements with after training assignment customersi and includes it in the regression:

Yi = 012(.)3(3)3(4)+ ..5(6)4(2)9(5)Ti+..2(4)7(2)7(3)Si+..0(1)2(1)3(1)(customers)i + ..1(2)1(3)8(8)(Female)i + ..1(8)9(2)7(3)(exp)i − ..1(4)3(4)1(3)(exp)i(2)/100+ei

Do you think the regression in (b) or (c) is more likely to estimate the causal eﬀect of treatment? Explain why.

(d) If you need to choose, would you use the data from ﬁrm A or B in order to estimate the causal eﬀect of training on sales? Explain why?

3. STATA Exercise: Submit the do-ﬁle and log ﬁle you will work on for the in-class practice for questions

1 to 7. Also, for each question submit your verbal/mathemetical explanations as requested for each question in a pdf document. The data is an extract of a 1992 survey of German workers. It comes from the paper by John DiNardo and Steven Pischke (1997) “The returns to computer use revisited: Have pencils changed the wage structure too?". Data is named restricted92.dta.

Question 1: Returns to Schooling

(a) Question 1a: Generate a scatter graph for log-wages (Variable: lnw) and education (Variable: ed) to visualize the relationship between two variables. Produce the graph with a ﬁtted line. Brieﬂy interpret the relationship you observe on the graph. Export your graph and include it in your answers (in the PDF ﬁle) as a picture in addition to your verbal explanations of the graph.

(b) Question 1b:

i. Provide a summary statistics table for the following variables by gender: ed exp mar lnw computer pencil telefon calc hammer. Export your table and include it in your answers (PDF ﬁle) as a table.

ii. Compare in particular the mean values of log wages and years of education and explain how they diﬀer by gender. Add your verbal explanations to your answers (PDF ﬁle) together with your table.

Question 2: Bivariate Linear Regression Model for Log Wages and Education: Now run a

the regression of log wages on education and a constant.

(a) Question 2a: What are the assumptions we need to make for the coeﬃcient of education to be the causal eﬀect of education on earnings? How can these assumptions be violated? (Hint: Explain how this violation can create a bias)

(b) Question 2b:

i. Write an interpretation of estimates of constant and slope for the statistical signiﬁcance and their magnitue?

ii. How can the slope coeﬃcient on education be interpreted as the percentage change in wage that is associated with an additional year of schooling? (Explain it with the proof as we did in class.)

Question 3: Linearity of the relationship between education and log earnings: Now create

dummy variables for each level of schooling: First, round the years of schooling: gen edr=round(ed,1) then, generate dummy variables for each year: tab edr, gen(ed_dum).

Now, regress log wages on dummy variables: regress lnw ed_dum*

(a) Explain why one variable drops out of 10 dummy variables?

(b) Is the eﬀect of education linear? Any non-linearities? (Hint: Look at the adjusted R-squared in this model and in the previous model. If there are non-linearities, the model with dummy variables should ﬁt the data better.)

Question 4: Multivariate Model: Linear regression model of log wages: Dependent Variable (Y) and Independent Variables (X’): Education, Experience, Experience Squared, Gender, Marital Status, Computer indicators.

(a) Write your regression equation in mathematical notation.

(b) What assumption do we need to make for the coeﬃcient of education to be interpreted as the “causal” eﬀect of education on earnings?

(d) What do the coeﬃcients of experience and experience square imply about the life-cycle proﬁle earnings? Would including only a linear term for experience lead to a more appropriate regression? (Hint: How do we calculate the max point? FOC?)

(e) Now Introduce e北p3 and e北p4 ? Improve the ﬁt of regression? (Hint: Compare adjusted R-squares) Interpret the F-stat for joint signiﬁcance of e北p3 and e北p4 ? Are they important variables to be included in the model? (Hint: Stata code for joint signiﬁcance test: test exp3 exp4)

(f) Using the estimation results from part b (without exp3 and exp4): What is the predicted level of wages for a married woman who uses computer with 16 years of education, and 7 years of experience?

Question 5: Multivariate Model with further controls: Now add pencil, phone, calculator,

and hammer use to the regression you estimated in previous question. (without exp3 exp4)

(a) Report the results from your estimation. Interpret the statistical signiﬁcance and economic mean- ings of each coeﬃcient.

(b) Is estimated returns to education diﬀerent from previous estimation in question 4 (without exp3 and exp4)? If so, Why?

Question 6: Now we will add occupation dummies: Fixed eﬀects Stata command to avoid to

generate all the dummy variables for the occupations:

areg lnw ed exp exp2 female mar computer pencil telefon calc hammer, absorb(occ)

(a) Interpret all coeﬃcients in terms of their statistical signiﬁcance and economic meaning.

(b) Compare the coeﬃcients of use of tools to the previous estimation without occupation controls (Hint: Implications for omitted variable bias in OLS estimates of the eﬀect of computer use on log

wages?) (Remember the paper DiNardo and Pischke 1997 QJE)

(c) Compare the coeﬃcient of education to the previous estimation without occupation controls. Explain why it is so diﬀerent.

(d) Why controlling for occupational indicators could be a bad idea when we are interested in the causal eﬀect of schooling on earnings?

Question 7: Make a regression table

Put all 5 regressions into 1 table and export it as csv or tex ﬁle and add it on your assignment document. Explain the table with a paragraph in a similar manner as in the papers we discussed in class.

First column: Bivariate model (Regression from Question 2)

Second column: Multivariate model (Regression from Question 4)

Third column: Multivariate model with exp3 and exp4 (Regression from Question 4d)

Fourth column: Multivariate model without exp3 and exp4 but with other tools (Regression from

Question 5)

Fifth column: Multivariate model without exp3 and exp4 but with other tools and occupations ﬁxed

eﬀects -dummies- (Regression from Question 6)

2022-10-12

Java

物理(Physical)

LINUX

C++