Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Individual Assignment 3

EC295 – Winter 2023 Due: Friday, April 7 at 9:00 PM

Assignment Description

In this assignment you are asked to manipulate data, estimate statistical relationships, and interpret the findings. The main goal behind the assignment is to help you get more comfortable applying statistical methods and using software, but also to think about a policy-relevant topic that economists actively research today.

The questions below guide you through the process of statistical estimation. You are provided the relevant Stata commands you will need, some of which you will not have seen before. It will therefore be useful for you to use the “help” function in Stata, and/or to look up the command in the Stata reference manuals (which are available within Stata as PDFs), or Google. You are also, as always, welcome to ask me for help.

I strongly suggest that you start this assignment early because it will not be possible (in my opinion) to do well if you start close to the due date. There are parts that you may find difficult; you will want to identify them and leave enough time to ask questions if necessary.

Assignment Instructions

Data analysis

In mylearningspace, you will find a datafile called “assign3.dta” that contains the data for this assignment. Download the dataset to your computer and make note of the folder where you save it.

I have also provided a template dofile that all students must use to write their assignment dofile (if you are using R, you will need to recreate something similar to this). Store it in the same folder where you put your data. You will need to manipulate that template in the following way:

- Rename the file from “assign3 template.do” to your last name followed by your student number (no spaces)

- After cd, replace INSERT THE PATH TO THE FOLDER WHERE YOU STORE THE DATA with the path to the folder where you stored the datasets. Do not remove the quotation marks.

- After log using, replace INSERT YOUR LAST NAME AND STUDENT NUMBER HERE with your last name and student number, with no space between the two. Do not remove the quotation marks

- After set seed, replace INSERT YOUR STUDENT NUMBER HERE with your full student number.

Leave all other commands and comments untouched. You should type in your Stata commands below the line that says “Insert your stata commands below here”, but above “Insert your stata commands above here”.

Note that the set seed and sample commands will take a random 95% subsample of the data that is different for every student. For this reason, the numbers that you get with your output will not be the same for any two students. Be mindful of this if you are comparing your work with your peers.

Submission

You are required to submit three documents:

a) A report containing your answers to all the questions. I outline below how I would like your report to look. The overall goal is that the answers to each question must be easily identifiable in a readable, professional-looking document. Submit to Gradescope. You will receive a 10% penalty if you fail to submit to Gradescope.

b) Stata dofile. Submit electronically using the dropbox in mylearningspace.

c) Stata log file. Submit electronically using the dropbox in mylearningspace.

In the report described in (a) above, please answer all questions in the same order as they are stated on the question sheet. For each question and sub-question, include the relevant Stata code (if any) that you used, the output generated by that command if there was any, and an interpretation if you are asked to provide it. For example, if you were answering the following hypothetical question, it might look like this:

************************************************************************************

1) Locate the variable y

a. Using the tab command, provide a frequency distribution for y Stata commands:

tab y;

Output:

y | Freq. Percent Cum.

  + 

1 |

23,844

10.05

10.05

2 |

138,568

58.40

68.45

3 |

9,049

3.81

72.26

4 |

63,162

26.62

98.88

5 |

2,651

1.12

100.00

  +  Total |   237,274 100.00

************************************************************************************ You could also format your own output tables rather than copying and pasting Stata output if you find it easier. The key is that as long as the questions are answered in order, and the Stata commands used for each subquestion and associated output are clear, it will be fine.

A note on plagiarism: this is an independent assignment, which I expect you to complete on your own. It is plagiarism to copy someone else’s work verbatim, which includes Stata dofiles. Any work you submit should be yours only.

Thank you note: I am very grateful to Professor Justin Smith for sharing his class material. This assignment represents a modified version of a STATA homework developed by him.

Each sub question is worth 5 points, for a total of 65.

1) (5 Points) Provide a table of basic summary statistics for your data and briefly describe your findings for each variable.

2) Suppose that you are interested in exploring whether mother’s alcohol consumption while pregnant affects a child’s birthweight.

a) (5 Points) Estimate and interpret the parameters in the following baseline specification:

������������ℎ������������ℎ������ = ���0 + ���1������������ℎ��������� + ������

Assume heteroskedasticity when calculating the standard errors.

b) (5 Points) Test whether the effect of consuming alcohol on birthweight is statistically significant at the 1% significance level.

c) Estimate an alternative specification that extends the baseline specification to include the variable smoker. Assume heteroskedasticity when calculating the standard errors.

i) (5 Points) Interpret all the estimates of the model.

ii) (5 Points) Compare the estimate of the coefficient on alcohol between specifications and comment on omitted variables bias.

iii) (5 Points) Using measures of fit we have discussed in class compare the fit of the model in 2a to that of 2c.

d) (5 Points) Estimate another alternative specification that extends the regression in (c) to include measures of education, age, and marital status. In your specification, use the “married” category as the reference category. Compare the estimate of the coefficient on alcohol to the previous specifications and comment further on omitted variables bias. [Hint: You will first need to generate the corresponding dummy variables for single and divorced categories. Since “married” is the reference category, you do not include a married dummy variable in the regression model.]

e) (5 Points) In the previous regression, precisely interpret the coefficients on the single and divorced dummy variables. Test the hypothesis that the effect on birthweight is the same for single and divorced mothers at the 5% significance level. [Hint: You can use the “test” command in STATA to perform this test.]

f) (5 Points) Suppose you decided to use the “single” category as the reference category in question (d); derive by hand (i.e., do not use STATA) the intercept and all slope estimates (coefficients on alcohol, smoker, education, age, married and divorced dummy variables). [Hint: make sure to provide an explanation on how you derived your estimates; you can check your answer with STATA.]

g) (5 Points) Test whether drinking and smoking are jointly statistically significant at the 1% significance level. Use the results from the regression in 2.d. [Hint: You can use the “test” command in STATA to perform this test.]

h) (5 Points) Re-estimate the regression in 2.d. assuming HOMOSKEDASTICITY. Test whether ANY variable in the regression is significant. Derive the homoscedasticity only F-statistic by hand (i.e., do not use the canned command in STATA to calculate it). Notice that you will have to run two regressions to be able to compute the F-statistic [Hint: while you cannot use STATA to obtain the F-statistic, you will use it to run the corresponding regressions).

i) (5 Points) Using an F-test (approach 1 in the lecture notes), test whether the effect of smoking on birthweight is 6 times the effect of drinking on birthweight. Use the results from the regression in 2.d. [Hint: You can use the “test” command in STATA to perform this test.]

j) (5 Points) Using a t-test (approach 2 in the lecture notes), test whether the effect of smoking on birthweight is 6 times the effect of drinking on birthweight. Report the results of any auxiliary regressions you need to run to complete this test. Use the results from the regression in 2.d.