Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

POLS0012 Causal Analysis in Data Science

ESSAY QUESTIONS 2025

Guidelines for Completing and Submitting POLS0012 Essay

•    Read the guidelines below to avoid losing unnecessary marks.

The assessment is due on Wednesday 21st January 2026, 2.00pm. It has two parts (A and B), both of which need to be submitted together as one single document. Parts A and B are worth 50 marks each.

•    Please follow all designated Department of Political Science submission guidelines. These may be different to those of your home department. You must submit one copy of your essay via Turnitin.

•    The datasets for the essay can be found in the ‘Final Essay Part A Materials ’ folder in the ‘Assessment Information and Materials’ section of Moodle

The word limit for both Parts A and B is 3,000 words in total, excluding your R script appendix (see below). You can divide the word limit as you like between the two parts , and you will not be penalised for using more of your words in part B.

•    This is an assessed piece of coursework for the POLS0012 module; collaboration and/or discussion with anyone is strictly prohibited. The rules for plagiarism apply and any cases of suspected plagiarism  of published work  or the  work  of classmates will be  taken very seriously.

•    You must read the guidelines on the use of AI posted on Moodle. AI usage is permissible in limited circumstances only, and AI must not be used to write the essay itself. All use of AI must be disclosed via citations. Further guidance on this is on Moodle.

•    You may open up the datasets and work on the essay questions  anytime up until the submission date. There is no limit on the number of times you may open the data files. Be sure to save your data files and R script file.

•    You should include a copy of your R script as an appendix to your essay. The essay answers should not contain any code or screenshots from R. FAILURE TO INCLUDE THE R SCRIPT WILL INCUR A 10 POINT PENALTY. Note that your R script file should be neatly presented and easy to follow, including comments indicating the question being addressed. Do not include any code that does not reproduce your essay answers.

•    Any figures must be included within your answers to the essay, not in the code appendix.

•    You may assume the methods you have used (e.g. a difference in means) are understood by the reader and do not need definitions, but you do need to say which techniques you have used and why.

PART A: QUANTITATIVE QUESTIONS

QUESTION 1: American Sheriffs’ Impact on Immigration Policy [25 Points]

In this question you will replicate and extend some of the work in:

Daniel M. Thompson (2019). "How Partisan is Local Law Enforcement? Evidence from Sheriff Cooperation with Immigration Authorities." American Political Science Review 114 (1): 222-236.

In the United States, Sheriffs — law-enforcement figures in each county — have the power to enforce, or refuse to enforce, requests from the federal Immigration and Customs Enforcement agency (ICE) to detain unauthorised immigrants. Most Sheriffs are directly elected in partisan elections, where the candidates run either as Democrats or Republicans and the candidate with the most votes in a given county is the winner. Many have speculated that there may be a causal effect of the partisanship of the sheriff on immigration outcomes. Specifically, a common view is that Democratic sheriffs are less likely to enforce ICE detention requests than their Republican counterparts. You will use data from the paper to assess the truth of this assertion. Your results will not be the same as in the paper, as we are doing slightly different analysis. The dataset is contained in the file “2025essayQ1.Rda”, where each observation is a county with the following variables recorded:

Variable name Variable description

treat =1 if the Democratic candidate was elected, 0 otherwise

dem_vote_share The Democratic candidate’s share of the total votes

share_detained_sheriff       The share of ICE detention requests enforced by the Sheriff (from 0 to 1)

running_var The democratic candidate’s vote share, minus 50

year Year of the observation

county_id Unique ID for each county

Answer the following questions:

a) [5 points] Explain why a regression discontinuity design (RDD) could help estimate the causal effect of the election of a Democrat sheriff on the enforcement rate, and why this is likely to be superior to calculating the mean difference between Democratic and Republican sheriffs.

b) [5 points] Briefly, discuss two potential violations of the RDD assumptions that could affect the inferences we make: how plausible are these violations, in your view? Provide evidence from the data to evaluate one of them quantitatively.

c) [2 points] Estimate the Local Average Treatment Effect of electing a Democratic sheriff on enforcement using RDD analysis with the optimal bandwidth. Report your results.

d) [5 points] Re-estimate the effects in question (c) using all whole-number bandwidths from 2 to 10. Use a figure to illustrate how sensitive your results are to these changes. Briefly, interpret your results.

e) [3 points] An alternative way to evaluate the hypothesis that the election of a Democratic sheriff has a causal effect on enforcement of ICE detention requests is to use a two-way fixed effects design. Implement an appropriate model to estimate this and report your result.

f) [5 points] Which of the two approaches in this question – regression discontinuity or fixed effects – do you find more convincing and useful as an estimate of the causal effect? Why?

QUESTION 2: Empowering Young Women in Uganda [25 Points]

In this question you will analyse and discuss some of the work in:

Oriana  Bandera et al (2020).   "Women’s  Empowerment  in  Action:  Evidence  from  a Randomised Control Trial in Africa." American Economic Journal: Applied Economics 12 (1): 210-259.

In Uganda many adolescent girls leave education early, have poor employment prospects and tend to have children at a young age, leaving them impoverished and dependent on men for survival. This paper analyses a large-scale experiment carried out in Uganda which was designed to empower young women politically, economically and in their relationships. You may find it useful to read the paper, which is on moodle. Again, your results will not be the same as in the paper, as we are doing slightly different analysis. The paper’s analysis is also often more complex than what we have covered in this module.

The experiment used block randomisation within ten large geographic areas (called ‘branches’ in the paper). In each block, 10 localised communities were randomly assigned to treatment and five to control. The treatment involved setting up girls’ development clubs in the treated communities, targeted at teenage girls. They provided vocational training for employment, as well as life skills training focused on sexual and reproductive health, family planning and preventing sexual violence. No clubs were established in the control communities.

Note: because of the use of block randomisation, you should control for the blocks in all regressions

Although the intervention was randomised by community, the authors measure results for individual girls. They surveyed girls in the treatment and control communities, asking whether the intervention led  (amongst other outcomes) to  (i) improvements in their economic  empowerment, measured through questions about their employment and earnings, and (ii) greater control over their own bodies, measured through questions about their knowledge of sexual health, control over sex and reproduction, use of contraception, and marriage and children. Surveys were carried out prior to the experiment, after two years (called “midline” in the paper), and after four years (called “endline” in the paper). We will focus only on the endline results after four years. You will analyse a simplified dataset containing a subset of the paper’s variables, in the file “2025essayQ2.Rda” . Each observation of the dataset contains survey responses from a girl in the sample, with the following variables included:

Variable name Variable description

treatment =1 if respondent was in a treatment community, 0 otherwise

attended =1 if respondent attended a girls’ club, 0 otherwise

completed =1 if respondent answered both the survey prior to the experiment and

the survey after four years, 0 if they answered only the survey prior to the experiment

branch_name Name of the branch used for block randomisation

age Age in years, prior to the experiment

enrolled =1 if respondent was enrolled in education prior to the experiment

children =1 if respondent had children prior to the experiment

gempowerment_pre Gender empowerment index, ranging from 0 to 100, where higher values

= greater self-reported empowerment. Measured prior to the experiment

econ_scale Scale measuring economic empowerment after four years, standardised

(i.e., it measures the number of standard deviations away from the mean)

controlbody_scale Scale measuring control over the body after four years, standardised (i.e.,

it measures the number of standard deviations away from the mean)

Answer the following questions:

a) [4 points] Briefly, discuss the ethics of this experiment. Do you think that it meets ethical guidelines for experimental research?

b) [3 points] Conduct appropriate tests for randomisation failure and report the results. What do you conclude about the success of randomisation?

c)   [5 pts] Estimate both intent-to-treat effects and local average treatment effects for the impact of girl’s development clubs on economic empowerment (econ_scale) and control over the body (controlbody_scale) after four years. Report and interpret the results.

d) [8 points] Analyse patterns of attrition in the experiment, reporting and explaining your results in all cases. Here, attrition means that a respondent answered the survey prior to the experiment but not the survey after four years. Specifically,

i.      Is attrition related to observable baseline characteristics?

ii.      Do attrition rates differ between girls in the treatment and control communities?

iii.      Do attrition rates differ between girls in the treatment and control communities that had the same school enrolment status prior to the experiment?

iv.      Using your answers from (i), (ii) and (iii), how much ofa threat do you think attrition poses to this experiment’s internal validity?

e) [5 points] Estimate and report the effects of being in a treated community for girls that did not attend a development club, for both the economic empowerment and control over the body scales after four years. Based on these results, and on the design of the experiment, how likely do you think it is that spillovers occurred?

Part B: Your Own Research Proposal (50 points)

Your  task  in  this  part  is  to  design  a  research  paper  that  answers  a  causal  question.   Present your design in a written research proposal that outlines how you would carry out your research, justifies your proposed methodology, and explains its potential limitations.  You should design a single research paper,  like the examples you encountered in this module.   Your proposed paper can address any issue of your choice from economics, human geography, political science, public health or public policy, provided you are asking a clear causal question.  You are free to propose any method (or combination of methods) to answer it, on three conditions:

1.  You MUST propose using at least one of the new techniques of causal inference covered in this module:  experiments, matching, instrumental variables, regression discontinuity, diference- in-diferences/fixed efects, or synthetic control. For example, you could design a field exper- iment, or propose a suitable instrumental variable.

2.  You MUST NOT use your undergraduate dissertation topic for this question, in order to ensure that the workload for this assignment is not disproportionately small for those students whose dissertations involve using a technique of causal analysis.

3.  You MUST NOT plagiarise an existing study, published or not.  It is perfectly acceptable for your proposal to be inspired by an existing study, but if this is the case you must say so, citing the existing study.  This has caused issues in the past.  Remember that we can very easily find out about plagiarism - don’t try it.

Your  proposal  should  be  achievable  and  realistic.   The  data  that  you  require  should  exist,  or should be collectible in principle.  However, you may assume that you have a large research budget, so that it is feasible to carry out a project that is expensive or logistically complex, where necessary.

Your essay should consist of a written research proposal.  It must contain details of:

•  The causal question you have chosen to answer.

•  The data that would be used to carry out your study.

• A technique (or set of techniques) of causal analysis that is designed to answer your question, and an explanation of why it is appropriate.

•  The statistical analysis that you will carry out.

• Any potential limitations of your approach:  what assumptions are needed for your study to yield a causal efect? How valid will your results be?

• Any  robustness  checks  or  other  exercises  that  you  will  carry  out  to  help  mitigate  these potential concerns.

Please note the following guidance:

•  This is a written task only. No statistical analysis should be included.

• Your proposal can be structured in sections with sub-headings.

•  Choose a research question that is clearly defined.   Outline concrete steps that you would take to answer it.

•  Do not worry about choosing a research design that is perfect.   Such a thing almost cer- tainly doesn’t exist.  Instead, show that you have thought carefully about the strengths  and drawbacks of your proposal.

• Higher marks will be reserved for proposals that are original, i.e.  not a minor variation on an existing study. I recommend thinking of a research question that interests you and designing a study from there, rather than taking an existing paper and trying to tweak it for this essay.

• Higher marks will also be reserved for proposals that would be feasible, e.g.  clearly laying out where the data will come from.