ECMT2150 INTERMEDIATE ECONOMETRICS, S1 2023 ASSIGNMENT
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
ECMT2150 INTERMEDIATE ECONOMETRICS, S1 2023
ASSIGNMENT
Due Date: 21 May 2023 (11:59pm sharp)
Instructions:
• Anonymous marking: Do NOT put your name anywhere on your assignment or in the file name. Identify yourself only by your student number.
• Answer all questions.
• A total of 100 points are available and marks for each question are indicated throughout.
• The assignment is worth 15% of your final grade for this UoS.
• You will need to use STATA (or another regression software program, e.g. R) to complete this assignment. Do not use Excel.
• Please read the information I provide on the next couple of pages carefully. I am available to discuss or may be happy to answer questions on the data and the context via Ed if anything is not clear.
Submission Instructions:
• Answers to Parts A-D are to be submitted via the Canvas Quiz, “Assignment Quiz …”
1 I encourage you to work through all of the data analysis following the questions in this document on Stata or another software package before heading to the quiz to answer the questions there. There are no trick questions, so if you have completed each of the following questions, kept a copy of your output and made a note of your answers, there will be no surprises when you are taking the quiz. You should not need to use Stata during the quiz at all. That said, the quiz is untimed, so you could leave and come back to the quiz if you need to.
1 Remember – because it is an untimed quiz, it will NOT automatically submit at the due date.
You must click submit yourself.
1 You will get only one attempt at the quiz.
• You must upload your Stata do file (or commands & output) in the final question of the Canvas quiz.
1 This upload is worth 5 points.
1 Think of this as a way of showing your working.
1 If you used Stata, then just the content of your do file is enough. But you will need to copy it into a Word doc or save it as a pdf to upload it.
1 If you did not create a do file, or did not use Stata, then you should upload a document (no longer than 5 pages) showing your commands and output.
• NB: In Stata, if you highlight some of the output in the Results window, then right- click, you can
a) copy-paste this into Word or some other word processing software. (In Word, Font Courier New in size 9 works well), or
b) copy-paste as a picture. This will capture your commands and output and you can paste this image into Word or some other word processing software.
1 You must submit your answer to Part E through the Assignment dropbox. Part E must be typed. It will be checked using Turnitin for plagiarism.
Assignment: Multiple Linear Regression
Inference, Heteroskedasticity, Endogeneity and IVs
The topic and information on the dataset
This assignment involves the application of a range of econometric methods to analyse how police should respond to domestic violence in order to reduce the number of repeat offences.
The issues are well summarised by Angrist and Pischke (2015, p. 116):
“abuse victims are often reluctant to press charges. Arresting batterers without victim cooperation may be pointless and could serve to aggravate an already bad situation... At the same time, victim advocates worry that the failure to arrest batterers signals social tolerance for violent acts that, if observed between strangers [as opposed to between domestic partners], would likely provoke a vigorous law enforcement response.”
We have a data set** used by Angrist and Pischke (2015) and Angrist (2006). The data set is named ‘Assignment.dta’1 .
• Download the data from theAssignment tabin our Canvas site.
• **Note: I have created a few different versions of the data and each student will have a link to just one of these. I have edited the data slightly for each version, but by enough that you need to work on your own data. If you work on one of your classmate’s data sets, you may answer one or more questions in the quiz incorrectly and lose marks or be referred to the academic integrity office.
This data was originally collected and analysed by Sherman and Berk (1984) and Berk and Sherman (1988). The original data come from the Minneapolis Domestic Violence Experiment (MDVE). The experiment was designed to assess the effect of arresting batterers or using a softer response on the re-occurrence of a domestic violence (DV) assault within 6 months at the same address.
The research design for the experiment randomly assigned two possible responses that the police officers attending the DV incident should follow. The police officers had to either :
• arrest the DV offender (the batterer), or
• use a softer response. In the softer response, Angrist calls it the “coddled” response, the police officers would either separate the DV offender by ordering the offender to stay
away from the address for 8 hours or counsel/offer advice to the DV offender. The incident had to meet certain criteria in order for the experiment to go ahead – namely, both the suspect and the victim had to be present when the officers arrived and the police officers had to have probable cause to believe that the cohabitant or spouse had committed a misdemeanor assault against a partner in the past 4 hours. Importantly, cases of life-threatening or severe injury (felony assault) were excluded.
So, we can see the two options the police officers had as two possible treatments, arrest or the “soft response” . Arrest usually meant a night in jail for the DV suspect.
How was the randomization supposed to work in the experiment?
Police officers had a physical pad of paper report forms that were randomly coloured and different colours indicated the police should follow a particular response to the incident. For the experiment, police officers were directed to act according to the colour of the report form on the top of the pad of forms. For example, if the top report form was green the police should arrest the DV offender and if the top report form was blue they should use the soft response on the DV offender.
Note that if the experiment had been carried our properly and the treatments (arrest or soft response) were truly randomised, we would be able to simply compare the probability of a re- occurrence of DV within 6 months for those offenders who were arrested to those who received the “soft response” .
But, in practice, police officers did not always follow the actions prescribed by the colour of the report form on the top of the pad. In some cases, the DV suspect was arrested even though the randomly assigned response by the report form directed the police to use the soft response. Sometimes there were other circumstances that lead the police officers to elect to arrest the DV suspect. Other times, the police officers simply forgot their report forms. So, in practice, which treatment was delivered by the police officers (arrest or soft response) was not truly random. Police officers had some role in choosing the response in at least some cases and so the choice they made may be related to other factors. I won’t say more here because I want you to think about this and answer some questions on it in your assignment.
So, we have a “broken” experiment. But all is not lost. IV methods can be a way to solve the problem and still identify the causal effect of the “soft response” versus arrest. In our data, for each DV incident, we have the following key variables:
• reoccur
• actual_soft
• assigned_soft
• actual_arrest
• assigned_arrest
as well as some useful other explanatory variables. See below for full details.
We will use the variable assigned_soft as an instrumental variable for the variable actual_soft.
References *you are not required to read these, but they could be helpful/interesting.
Angrist (2006) “Instrumental variables methods in experimental criminological research: what, why and how”, Journal of Experimental Criminology, 2:23-44.
- Link through the library:
https://sydney.primo.exlibrisgroup.com/permalink/61USYD_INST/2rsddf/cdi_proquest_j ournals_821707513
Angrist and Pischke (2015), Chapter 3 in Mastering ‘Metrics: The path from cause to effect,
Princeton University Press.
- Library link:
https://sydney.primo.exlibrisgroup.com/permalink/61USYD_INST/1c0ug48/alma9910320 63095205106
Berk and Sherman (1984) “The Specific Deterrent Effects of Arrest for Domestic Assault”, American Sociological Review, 49(2), 261– 272.
- https://www.jstor.org/stable/2095575
Berk and Sherman (1988) “Police Responses to Family Violence Incidents: An Analysis of an Experimental Design with Incomplete Randomization”, Journal of the American Statistical Association, 83(401), 70–76.
More info on the variables and data
The data is a cross-section from 1981/82 on 290 DV incidents. There are 290 rows – one for each incident - and 21 columns. The columns correspond to the variables:
Variable name Description
id Incident identification number
reoccur = 1 if a DV incident occurs again at the same address within 6 months,
and 0 otherwise
actual_soft = 1 if actual police response was the soft response (i.e separate or give
advice), and 0 if the actual response was arrest
assigned_soft = 1 if the assigned police response was the soft response (i.e separate or
give advice), and 0 if the assigned response was arrest
s_influence = 1 if the suspect was under the influence of drugs/alcohol and 0
otherwise
anyweapon = 1 if any weapon (gun or other weapon such as a blunt or sharp object)
was present and 0 otherwise
black = 1 if the victim’s race was black and 0 otherwise
native = 1 if the victim’s race was Native American and 0 otherwise
other_nw = 1 if the victim’s race was other non-white and 0 otherwise (ie if white,
black or Native American)
mixedrace = 1 if the suspect and victim’s race is not the same as one another, and 0
otherwise
gun = 1 if a gun was present and 0 otherwise
o_weapon = 1 if a blunt or sharp object was present and 0 otherwise
year year in which the incident occurred, 1981 or 1982
quarter quarter of the year in which the incident occurred
time time spent at the scene by police in minutes (missing for some incidents)
actual_arrest = 1 if actual police response was arrest and 0 if the actual response was
the soft response (note this is the reverse of the actual_soft dummy)
assigned_arrest = 1 if assigned police response was arrest and 0 if the assigned response
was the soft response (note this is the reverse of the assigned_soft dummy)
actual_separate = 1 if actual police response was to separate and 0 otherwise
assigned_separate = 1 if assigned police response was to separate and 0 otherwise
actual_advice = 1 if actual police response was to give advice and 0 otherwise
assigned_advice = 1 if assigned police response was to give advice and 0 otherwise
Part A: Descriptive Statistics for the Sample [9 marks]
Quiz questions 1-4: [7 marks]
Investigate the distribution of the variables:
reoccur, actual_soft, assigned_soft
s_influence, anyweapon black, native, other_nw, mixedrace
For each, find the average, standard deviation, minimum, and maximum of its sample distribution.
Also work out or tabulate the number of incidents that were:
(i) randomly assigned a soft response
(ii) randomly assigned a soft response but where the police officers arrested the DV
suspect, and
(iii) randomly assigned to arrest
(iv) randomly assigned to arrest but where the police officers chose a soft response and
did not arrest the DV suspect
In the quiz you will be asked to report selected summary statistics rounded to 2 decimal places. You will also report on the numbers of incidents assigned particular responses and how many incidents had differences in the assigned and actual responses.
Quiz question 5: [2 marks]
Pause, review and think about what you learn from these descriptive statistics. In the quiz you will be asked to briefly describe one useful, unusual or noteworthy thing you discovered from these descriptive statistics.
Part B: Simple & Multiple Regression Model - Estimation and Testing [22 marks]
Quiz question 6: [3 marks]
TeoccuT = F0 + F1 actual_soft + u (EQ. 1)
In the quiz you will report selected coefficient estimates, standard errors and the R-squared, rounded to 4 decimal places.
Quiz questions 7-8: [5 marks]
What is the sign of your estimated slope coefficient? Based on these estimates, is the soft response associated with a higher or lower probability of re-occurrence of a DV assault? Interpret the estimated intercept and estimated slope coefficient from (EQ.1). In particular, what is the share of incidents where a DV assault reoccurs within 6 months when :
a) a soft response was used and
b) when the DV suspect was arrested?
Quiz questions 9-11: [4 marks] Is the estimated slope coefficient in (EQ.1) significantly different from zero at the 5% level of significance?
In the quiz, you will not need to set out all the steps of the hypothesis test, but you will need to write down the null and alternative hypotheses for the test, report the p-value, and report whether it is or is not statistically significant.
In your quiz answers, writing H0 and H1, beta1, beta1hat, etc is fine – you are not required to use subscript formatting or typeset maths in your quiz answers. But distinguishing between and using beta1hat or beta1 is important. To write not equal to 0, you can write it out in words, or write neq or not=.
Quiz question 12 [3 marks]:
Do you think the estimated slope coefficient in (EQ.1) is a causal estimate? Briefly explain.
Quiz question 13 [3 marks]:
Now add additional explanatory variables to the model as shown here in (EQ.2): TeoccuT = F0 + F1 actual_soft + F2 s_influence + F3 anyweapon
+ F4 black + F5native + F6 otheT_nw + F7mixedTace
+61y1982 + y1 qtT2 + y2 qtT3 + y3 qtT4 + u (EQ. 2)
Notice that you will need to create some additional dummy variables:
y1982 = 1 if the incident occurred in year 1982 and 0 otherwise
qtr2 = 1 if the incident occurred in quarter 2 and 0 otherwise
qtr3 = 1 if the incident occurred in quarter 3 and 0 otherwise
qtr4 = 1 if the incident occurred in quarter 4 and 0 otherwise
In the quiz you will report selected coefficient estimates and the R-squared rounded to 4 decimal places.
Quiz questions 14-16: [3 marks]
• Find the 90% confidence interval for the coefficient F1 on actual_soft in (EQ.2).
o In the quiz, you will report one of the bounds of the confidence interval, rounded to 3 decimal places.
o You can calculate this yourself – if so, be sure to make any calculations using all of the decimal places given in your Stata regression output.
o Or, you can use a Stata command – check out the options on the command regress. To see all the options for the regress command, type help regress, in the Stata command window.
• Using your confidence interval, is actual_soft statistically significant in EQ.2 at the 10% significance level? (Yes/No)
• State how you used the confidence interval you calculated in order to determine whether actual_soft is statistically significant at the 10% significance level?
Quiz question 17: [1 mark]
Based on your estimated results for (EQ.2), is the soft response associated with a higher or lower probability of reoccurrence? You can ignore whether or not it is statistically significant for the purposes of this answer.
Part C: Heteroskedasticity [13 marks]
Quiz questions 18-23: [9 marks]
Apply the modified White test for the presence of heteroskedasticity to model (EQ.2), using a 1% significance level. What do you conclude?
• Please use an F-test for your test.
• NB. For full marks, you must conduct all the steps of the test as per the lecture notes or as described in the textbook.
In the quiz you will
• report selected coefficient estimates and the R-squared from your auxiliary regression each to 4 decimal places,
• report the test statistic, the degrees of freedom and either the critical value or the p-value for the test,
• provide the conclusion from your test, and
• comment on why your conclusion regarding whether the errors are heteroskedastic or not in this case is entirely expected.
Quiz question 24: [2 marks]
Re-estimate the model (EQ.2) with robust standard errors. In the quiz you will report selected standard errors to 4 decimal places.
Quiz question 25: [2 marks]
Here you will answer a MCQ about the differences between the robust standard errors and the regular standard errors you found above in Part B for (EQ.2).
** Regardless of your findings in Part C, use robust standard errors from this point forwards **
Part D: Endogeneity and Instrumental Variables [41 marks]
Quiz question 26: [4 marks]
actual_soft is likely endogenous in (EQ.2) despite the additional explanatory variables we have included. If so, does the multiple regression model in (EQ.2) capture a causal relationship between a soft response to a DV incident and the reoccurrence of a DV assault within 6 months? Why or why not? What does this imply about E(u |actual_soft)?
Quiz question 27: [3 marks]
Provide a clear careful and intuitive explanation for why actual_soft is endogenous in (EQ.2).
Quiz question 28: [3 marks]
What is the impact of the endogeneity of the variable actual_soft in (EQ.2) on your estimates and inference if you estimate model (EQ.2) using OLS?
Quiz question 29: [4 marks]
The variable assigned_soft provides a potential instrumental variable (IV) we could use to cleanly identify the causal effect of the soft response on the reoccurrence of a DV assault within 6 months. What two key conditions must the IV satisfy in order for the IV estimator to be consistent? State whether each these conditions can be tested.
Quiz question 30: [4 marks]
Discuss whether, and why or why not, we could expect the IV, assigned_soft, to satisfy these two conditions that you gave in Question 29. To do this, use intuition or simple economic theory.
Quiz questions 31: [3 marks]
Estimate the first stage equation if we are going to use assigned_soft as an IV for actual_soft in (EQ.2). Use robust standard errors.
In the quiz you will report selected coefficient estimates to 4 decimal places.
Quiz questions 32-33: [3 marks]
Using the estimation results from your first stage that you estimated for Question 31, test the relevance of the IV, assigned_soft (also known as a test of identification). Use a 1% level of significance.
In the quiz, you will
• report the test statistic to 2 decimal places and
• select the correct formal conclusion for your test (MCQ).
Quiz questions 34: [4 marks]
Re-estimate model (EQ.2) by 2SLS using assigned_soft as an IV for actual_soft. Use robust standard errors.
In the quiz you will report selected coefficient estimates and standard errors to 4 decimal places.
Quiz question 35: [3 marks]
Interpret the 2SLS-IV estimate for F1, the coefficient on actual_soft.
Quiz question 36: [4 marks]
Comment on the differences between the 2SLS-IV and OLS estimates and their robust standard errors for F1, the coefficient on actual_soft, in (EQ.2). For reference – these are the estimates from Question 34 and Question 24, respectively. When we used OLS, did we over or underestimate the effect of the soft response on the reoccurrence of DV?
Quiz question 37: [1 mark]
What can we now conclude? That is, from the 2SLS-IV estimates for (EQ.2) - does it appear that the soft response leads to a higher or lower probability of reoccurrence of DV?
Quiz question 38: [5 marks]
Upload your Stata do file in the final question of the Canvas quiz. See the directions on the first page of this document.
Part E: Conclusions [15 marks]
Provide a short summary or conclusion for your findings on the research question – “what is the effect of the soft response probability of reoccurrence of DV?” . Be sure to comment on whether you have identified a causal efect. Is the effect big or small? Support your conclusions.
NB:
• Your answer for Part E should be only 5 or 6 sentences.
• Answers must be 250 words or less. Include your word count in your document. Answers that exceed the word count may be penalized.
• The best answers are short, to the point, and focused on the key take-aways from our analysis. Imagine you just have 1-2 minutes to tell someone what the research is about and what you found.
• Do not describe everything you did. If you do so, you may be penalized. Many steps in our analysis are necessary parts of a research project, but we do not need to list all these steps and things we have checked when we are reporting our key results.
• You must type up your answer to this question in your own words and submit it through the assignment dropbox.
• It will be checked using Turnitin for plagiarism.
• It should be a doc, docx or pdf. No other file types will be accepted.
2023-05-10