Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Individual Assignment 2

EC295 – Winter 2023

Assignment Description

In this assignment you are asked to manipulate data, estimate statistical relationships, and interpret the findings. The main goal behind the assignment is to help you get more comfortable applying statistical methods and using software, but also to think about a policy-relevant topic that economists actively research today.

The questions below guide you through the process of statistical estimation. You are provided the relevant Stata commands you will need, some of which you will not have seen before. It will therefore be useful for you to use the “help” function in Stata, and/or to look up the command in the Stata reference manuals (which are available within Stata as PDFs), or Google. You are also, as always, welcome to ask me for help.

I strongly suggest that you start this assignment early because it will not be possible (in my opinion) to do well if you start close to the due date. There are parts that you may find difficult; you will want to identify them and leave enough time to ask questions if necessary.

 Assignment Instructions

Data analysis

In mylearningspace, you will find a datafile called “EC295Assignment2.dta” that contains the data for this assignment. The data come from a study of serious adolescent offenders as they transition from adolescence into early adulthood: Pathways to Desistance Study. Download the dataset on to your computer and make note of the folder where you save it.

I have also provided a template dofile that all students must use to write their assignment dofile (if you are using R, you will need to recreate something similar to this). Store it in the same folder where you put your data. You will need to manipulate that template in the following way:

- Rename the file from “EC295_assign2TEMP.do” to your last name followed by your student number (no spaces)

- After cd, replace INSERT THE PATH TO THE FOLDER WHERE YOU STORE THE DATA with the path to the folder where you stored the “EC295Assignment2.dta” dataset. Do not remove the quotation marks.

- After log using, replace INSERT YOUR LAST NAME AND STUDENT NUMBER HERE with your last name and student number, with no space between the two. Do not remove the quotation marks

- After set seed, replace INSERT YOUR STUDENT NUMBER HERE with your full student number.

Leave all other commands and comments untouched. You should type in your Stata commands below the line that says “Insert your stata commands below here”, but above “Insert your stata commands above here”.

Note that the set seed and sample commands will take a random 95% subsample of the data that is different for every student. For this reason, the numbers that you get with your output will not be the same for any two students. Be mindful of this if you are comparing your work with your peers.

Submission

You are required to submit three documents:

a) A report containing your answers to all the questions. I outline below how I would like your report to look. The overall goal is that the answers to each question must be easily identifiable in a readable, professional-looking document. Submit to Gradescope.

b) Stata dofile. Submit electronically using the dropbox in mylearningspace.

c) Stata log file. Submit electronically using the dropbox in mylearningspace.

In the report described in (a) above, please answer all questions in the same order as they are stated on the question sheet. For each question and sub-question, include the relevant Stata code (if any) that you used, the output generated by that command if there was any, and an interpretation if you are asked to provide it. For example, if you were answering the following hypothetical question, it might look like this:

************************************************************************************

1) Locate the variable y

a. Using the tab command, provide a frequency distribution for y Stata commands:

tab y;

Output:

y | Freq. Percent Cum.

  + 

1 |

23,844

10.05

10.05

2 |

138,568

58.40

68.45

3 |

9,049

3.81

72.26

4 |

63,162

26.62

98.88

5 |

2,651

1.12

100.00

  +  Total |   237,274 100.00

************************************************************************************ You could also format your own output tables rather than copying and pasting Stata output if you find it easier. The key is that as long as the questions are answered in order, and the Stata commands used for each subquestion and associated output are clear, it will be fine.

A note on plagiarism: this is an independent assignment, which I expect you to complete on your own. It is plagiarism to copy someone else’s work verbatim, which includes Stata dofiles. Any work you submit should be yours only.

Thank you note: I am very grateful to Professor Justin Smith for sharing his class material. This assignment represents a modified version of a STATA homework developed by him.

In mylearningspace you will find a dataset called “EC295Assignment2.dta”. Please use this datafile to answer the following questions. Each question is worth 5 points, for a total of 85.

1) Estimating the relationship between number of crimes and criminal experience.

a. (5 points) Suppose you are interested in knowing whether criminal experience increases number of crimes. You propose the following linear regression model

��������������������� = ���0 + ���1���������������_��������������� + ���

Estimate and interpret the two parameters of the model using the robust option (i.e., reg y x, robust). Does the intercept have a useful interpretation in this context?

b. (5 points) What is the predicted number of crimes for someone of average criminal experience? [Hint: use the summarize command to calculate average years of crime in the sample.]

c. (5 points) What is the difference in predicted number of crimes between the most experienced and the least experienced in the sample (i.e., no experience)? Show your work. [Hint: use the summarize command to calculate min and max years of crime in the sample.]

d. (5 points) Using the Standard Error of the Regression, describe how well the regression line fits the data.

e. (5 points) In Assignment 1, we found some outliers in terms of the variable number of crimes. Reestimate the model excluding observations in which the number of crimes is larger than 400 and explain whether the OLS estimates are sensitive to the presence of outliers. Justify your answer. [Hint: use the if command to exclude outliers, e.g., “reg y x if x<=400”].

2) (20 points) Testing hypotheses about the relationship between number of crimes and criminal experience. For all questions in this section, use the results from the regression performed in question

1.a (make sure you used the robust option in question 1.a).

a. (5 points) Based on the regression results from question 1.a, does years of crime have a statistically significant effect on number of crimes? Explain. [Hint: testing if a variable has a significant effect on another variable implies testing ���0: ���1 = 0 against ���1: ���1 ≠ 0.]

b. (5 points) Compute the p-value for the null hypothesis that the slope equals 10 versus the alternative that it does not equal 10. Do you reject the null hypothesis at the 5% level? At the 10% level? [Hint: use the “test” command, immediately after using the “regress” command.]

c. (5 points) Manually compute the actual value of the t-statistic for testing the hypothesis that the slope is smaller than or equal to 10 against the alternative that is larger than 10. Show your work.

d.  (5 points) Combining the display and invttail functions in STATA, compute the critical value for the hypothesis test in part (c) if the significance level is 5%. Compare the actual value of t from part (c) to the critical value and decide whether you accept or reject the null hypothesis.

e. (5 points) Construct and interpret a 90% confidence interval for the effect of a one year increase in criminal experience.

f. (5 points) Construct a 95% confidence interval for the effect a 2-year increase in criminal experience.

g. (5 points) Compute the residual for each person in the sample using the regression results from 1a. Then, plot these residuals against criminal experience in a scatterplot. Using this scatterplot, comment on whether you think the errors are homoskedastic or heteroskedastic. [Note: While the residuals do not definitively answer this question, we can look at the spread of the residuals at each level of criminal experience for clues.]

h. (5 points) What problems are created when we assume the errors are homoskedastic when in fact they are heteroskedastic?

3) Dummy variables

a. (5 points) Produce a table of summary statistics (mean, standard deviation, min, max) for the variables “ncrimes” and “some experience”. For each variable, interpret the mean. Be precise.

b. (5 points) Suppose you model the relationship between crime and experience as follows

��������������������� = ���0 + ���1������������ ��������� + ���

where some exp is a dummy variable that takes the value 1 when criminal experience is above zero and 0 otherwise.

Estimate and interpret the slope and intercept in this model using the robust option.

c. (5 points) Test the null hypothesis that individuals with some experience engage in more crimes than individuals with no experience, at the 5% significance level.

d. (5 points) Suppose you instead estimated the following model:

��������������������� = ���0 + ���1������ ������������������������������ + ���

where no experience is a dummy variable that takes the value 1 when criminal experience is zero and 0 otherwise (i.e., no experience = 1 – some experience). Derive by hand the slope and intercept in this model (i.e., do not estimate the parameters in STATA). Show your work.