Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

POLS0012 2023-24

Causal Analysis in Data Science

CONTENT

This module introduces the rapidly-growing field of causal inference. Increasingly, social scientists are no longer willing to merely establish correlations and assert that these patterns are causal. Instead, there is a new focus on design-based inference, designing research studies in advance so that they yield causal effects. We will begin by asking what it means for X to cause Y, using the framework of potential outcomes. We will then look at the most popular  research  designs   in  causal  analysis,   including  experiments   (also  known  as randomised  control  trials),  natural  experiments  that  we  can  analyse  with  instrumental variables and  regression discontinuity techniques, and causal  inference over time with difference-in-differences  and  synthetic  control.  We  will  also  evaluate  ‘observational’ methods – regression and the related technique of matching – from the standpoint of causal inference. This course has a hands-on, practical emphasis. Students will learn to design effective studies and implement these methods in R, and will become critical consumers and  evaluators  of  cutting-edge   research.  Examples  will   be  drawn  from   economics, geography, political science, public health and public policy.

By the end of the module, students will be able to:

•    Understand the concept of causation in the social sciences

•    Distinguish between observational and causal analysis

•    Design research studies that can yield causal effects

•    Implement a range of techniques of causal analysis including experiments, matching, instrumental   variables,   regression    discontinuity,   difference-in-differences    and synthetic control

•    Evaluate  the  advantages  and  disadvantages  of  different  research  designs  and methods

•   Critically evaluate quantitative journal articles in the social sciences

LECTURES, TUTORIALS AND LEARNING MATERIALS

Each week there will be an introductory lecture followed by a tutorial. The lecture will last two hours and the tutorial will last one hour. The lectures will introduce students to the course material. The tutorials will be largely computer-based, learning to implement the techniques in R through weekly coding exercises.

In contrast to other Social Data Science courses this module contains material that is not very mathematically challenging, but can be conceptually difficult.  For that  reason, the lectures rely far less on slides and much of the teaching will take place on the whiteboard. A few slides are used and will be provided in advance so that students can prepare for the class. To make sure that students have a good record of what is discussed in class, full typed  lecture  notes will  be  provided  after  each  session  and  lecture  recordings  will  be available via Lecturecast. To do well in the course, you should complete the recommended readings before class, and then review the lecture notes and readings again after class to make sure you have understood everything.

All class materials (lecture notes, slides, lecture recordings, tutorial coding exercises) will be available on the class Moodle page.

Teaching:

•   The lectures will be on Wednesdays, 9-11am in Lecture Theatre G6 in the Institute of Archaeology on Gordon Square

•   The tutorials are on Thursdays, various times and locations (see your timetable)

ASSESSMENT

The module is assessed through the completion of a final 3,000-word essay, worth 100% of the final grade. It will contain two parts. Part A will be a set of quantitative questions that require you to implement techniques from the course in R and write up the results, similar to the weekly tutorial assignments. Part B requires you to design an original research study using one of the techniques taught in the module. Part B will be available at the start of term so that you can work on it at your own pace.

The deadline for the essay is TBC

Please remember that plagiarism is taken extremely seriously and can disqualify you from the module (see https://www.ucl.ac.uk/students/exams-and-assessments/academic-integrity).  If you are in doubt about any of this, ask the tutor.

OTHER NON-ASSESSED WORK

The tutorials will allow students to apply and test their knowledge of the material covered on the module. You will be assigned exercises to complete in R, which may take longer to complete than the one-hour slot. If you do not finish during class time, you must finish them in your own time. Full solutions will be posted on the course Moodle page.

READING MATERIALS

In order to fully understand of the concepts and techniques taught in this module, students will need to do background reading. Causal analysis is a relatively new and rapidly-evolving field. As such, there is no single textbook that covers the whole course, although we will read much of Gerber and Green’s book on experiments and Angrist and Pischke’s textbook on causal inference, both listed below. Other required readings on the techniques that we cover are drawn from a variety of other textbooks and journal articles. In addition, the course reading also contains applied journal articles that implement the methods we learn about. These are useful to help understand how the methods are applied in practice, especially for the second part of the final essay. It is not always necessary to read and understand every detail of each article; focus on how and why they apply the methods we learn about.

The main textbooks for this course are:

•   Alan S. Gerber and Donald P. Green. Field Experiments: Design, Analysis and Interpretation. WW Norton and Co., 2012

[available on short loan from UCL library]

•   Joshua D. Angrist and Jorn-Steffen Pischke. Mastering Metrics: The Path from Cause to Effect. Princeton University Press, 2015

[available online through UCL library]

Many other textbooks cover parts of the course, often in a more advanced fashion. Here’s a list of works to consult for more information on certain topics. We’ll also read individual chapters from some of them:

•   Joshua D. Angrist and Jorn-Steffen Pischke. Mostly Harmless Econometrics: An Empiricist’s Companion, Princeton University Press, 2009

•   Scott Cunningham. Causal Inference: The Mixtape. Yale University Press, 2021 [available online at https://mixtape.scunning.com/]

•   Thad Dunning. Natural Experiments in the Social Sciences: A Design-Based Approach. Cambridge University Press, 2012

•    Miguel Hernán and James M. Robins. Causal Inference: What If. CRC Press, 2020.

•   Guido Imbens and Donald Rubin. Causal Inference for Statistics, Social and Biomedical Sciences: An Introduction. Cambridge University Press, 2015

•   Stephen L. Morgan and Christopher Winship. Counterfactuals and Causal Inference: Methods and Principles for Social Research 2nd ed. Cambridge University Press, 2014

•    Rebecca Morton and Kenneth Williams. Experimental Political Science and the Study of Causality: from Nature to the Lab. Cambridge University Press, 2010

•    Paul R. Rosenbaum. Observation and Experiment: An Introduction to Causal Inference. Harvard University Press, 2017

Finally, the following two ‘popular science’ books contain accessible introductions to experiments, causal inference and their applications. They may be of interest as background reading:

•   Abhijit Banerjee and Esther Duflo. Poor Economics: A Radical Rethinking of the Way to Fight Global Poverty. PublicAffairs, 2012.

•   Judea Pearl and Dana Mackenzie. The Book of Why: The New Science of Cause and Effect. Allen Lane, 2018.

Wherever possible, readings will be posted on Moodle. This includes all papers as well as  chapters from online books. However, please note that Gerber and Green is only available as a physical copy, and can be accessed on short loan from UCL library.

WEEKLY OUTLINE

Week 1: Statistical Preliminaries

Week 2: Causation and Randomised Experiments

Week 3: Randomised Experiments: Internal Validity

Week 4: Randomised Experiments: Inference and External Validity

Week 5: Matching, Propensity Scores and Regression

Week 6: Reading Week (no classes)

Week 7: Compliance, Instrumental Variables and Natural Experiments

Week 8: Instrumental Variables and Natural Experiments in Practice

Week 9: Regression Discontinuity

Week 10: Difference-in-Differences and Fixed Effects Estimation

Week 11: Synthetic Control Analysis

WEEKLY MODULE CONTENTS AND READINGS

Week 1. Statistical Preliminaries

We’ll start with a recap of some core skills and concepts from statistics, as well as R code, bringing everyone up to speed in preparation for the rest of the module.

Week 2. Causation and Randomised Experiments

We’ll develop a counterfactual model of causation that explains the distinction between correlation and causation, illustrated by epidemiological debates about diets and health outcomes. We’ll use the model to examine why randomised experiments offer a solution to the “fundamental problem of causal inference” , and we’ll learn how to analyse experiments using average treatment effects.

Required Reading:

•   Gerber and Green, Chapters 1 and 2.1-2.6

•    Gary Taubes, “ Do We Really Know What Makes Us Healthy?”, New York Times Magazine,16th September 2007. Available at:

http://www.nytimes.com/2007/09/16/magazine/16epidemiology-t.html

Supplementary Reading:

•   Cunningham, Chapter 4

•    Imbens and Rubin, Chapters 1-2

Week 3. Randomised Experiments: Internal Validity

Experiments are statistically simple, but complex to administer in practice. We’ll cover the concept of internal validity: does an experiment truly uncover a causal effect? We’ll learn how to use balance tests to detect failures of randomisation, as well as how to cope with attrition. A famous experiment on class size reduction in primary schools provides a key example of the challenges of achieving internal validity in practice.

Required Reading:

•   Gerber and Green, Chapters 2.7, 3.6, 4.3-4.4 and 7

•   Alan Krueger (1999). “Experimental estimates of education production functions.” Quarterly Journal of Economics 114 (2): 497-532 [focus on pp. 497-517]

Supplementary Reading:

•   Andrew Beath, Fotini Christia and Ruben Enikopolov (2013). \Empowering Women through Development Aid: Evidence from a Field Experiment in Afghanistan."

American Political Science Review 107 (3): 540-557

•   Gerber and Green, Chapter 8

•    Morton and Williams, Chapter 7.1-7.2

Week 4. Randomised Experiments: Inference and External Validity

This week we’ll finish  learning  to  analyse  experiments  by  looking  at  a  new  inference technique (Fisher’s Exact Test, aka randomisation inference) . Then we’ll look briefly at external validity. The aim of experiments is to learn about causal effects in the real world, but they may take place in artificial settings or on samples that differ from the populations that we care about. We’ll ask how much we can hope to learn from experiments and how policy-makers can use experimental results in practice.

Required Reading:

•   Cunningham, Chapter 4.2

•    Dani Rodrik (2008). “The new development economics: we shall experiment, but how shall we learn?” Harvard Kennedy School Research Paper

Supplementary Reading:

•    David Broockman and Josh Kalla (2015). “Campaign Contributions Facilitate

Access to Congressional Officials: A Randomized Field Experiment.” American Journal of Political Science 60 (3): 545-558.

•    Morton and Williams, Chapters 7.3-9

Week 5. Observational Studies and Causal Inference: Matching, Propensity Scores and Regression

In many cases it is impossible to carry out experiments. Matching, often using propensity scores, offers a close analogy to experiments in an observational setting and involves a similar set of assumptions to regression. We’ll learn how to do matching, asking how closely observational methods can approximate experiments. Examples are drawn from literature on smoking and health, and violence in civil wars.

Required Reading:

•    Elizabeth Stuart (2010). “Matching methods for causal inference: a review and look forward.” Statistical Science 25 (1), pp. 1-21

•    Donald Rubin (2007). “The design versus the analysis of observational studies for   causal effects: parallels with the design of randomized trials.” Statistics in Medicine 26 (1): 20-36

•   Angrist and Pischke, Chapter 2 [including Appendix pp. 82-5]

Supplementary Reading:

•   Angrist and Pischke, Mostly Harmless Econometrics Chapter 3

•    Peter M. Aronow and Cyrus Samii (2016). “Does Regression Produce

Representative Estimates of Causal Effects?” American Journal of Political Science 60 (1): 250-267.

•   Cunningham, Chapter 5

•   Jason Lyall (2010). “Are co-ethnics more effective counterinsurgents? Evidence  from the second Chechen war.” American Political Science Review 104 (1): 1-20

Week 6: Reading Week (No Classes)

Week 7. Compliance, Instrumental Variables and Natural Experiments

Instrumental variables is a powerful technique that has been used in two different settings. We’ll first learn how to use instrumental variables to analyse randomised experiments where some units fail to comply with the experiment. Second, natural experiments - where an outcome occurs randomly without the intervention of the analyst - have become increasingly popular in the social sciences. We’ll define natural experiments and learn how to analyse them using the method of instrumental variables.

Required Reading:

•   Gerber and Green, Chapters 5 and 6

•    David Clingingsmith, Asim Ijaz Khwaja and Michael Kremer (2009). “Estimating the impact of the Hajj: religion and tolerance in Islam’s global gathering.” Quarterly

Journal of Economics 124 (3): 1133-1170

Supplementary Reading:

•   Cunningham, Chapter 7

•    Dunning, Chapters 1 and 4

Week 8. Instrumental Variables and Natural Experiments in Practice

We’ll use recent studies to illustrate how instrumental variables can work when applied to natural experiments and how they can go wrong. We’ll discuss studies on the effect of western TV on support for communism in East Germany, the relationship between police numbers and crime, the political impact of the US ‘Tea Party’ protest movement, and how participation in the Hajj pilgrimage alters the beliefs of Muslims.

Required Reading:

•   Angrist and Pischke, Chapter 3

•   Jens Hainmueller and Holger L. Kern (2009). “Opium for the masses: how foreign media can stabilize authoritarian regimes.” Political Analysis 17 (4): 377-399

•   Steven D. Levitt (1997). “Using electoral cycles in police hiring to estimate the effects of police on crime.” American Economic Review 87 (3): 270-290

•    Madestam etal (2014). “Do political protests matter? Evidence from the tea party movement.” Quarterly Journal of Economics 128 (4): 1633-1685

Supplementary Reading:

•   Dunning, Chapters 7-10

Week 9. Regression Discontinuity Designs

Regression discontinuity analysis involves a natural experiment where treatment is assigned based on an arbitrary rule, like exceeding a threshold. We’ll learn how to do the analysis, looking at a paper on whether British MPs are able to use office to enrich themselves.

Required Reading:

•   Angrist and Pischke, Chapter 4

•   Andy Eggers and Jens Hainmueller (2009). “MPs for Sale? Returns to Office in Postwar British Politics.” American Political Science Review 103 (4): 513-533

Supplementary Reading:

•   Cunningham, Chapter 6

•    Dunning, Chapter 3

Week 10. Causal Inference over Time: Difference-in-Differences and Fixed Effects

Difference-in-differences or fixed effects can be used for causal inference with panel data, when a treatment varies overtime in some units but not others. We’ll look at a very famous example that over-turned economists’ thinking on minimum wages.

Required Reading:

•   Angrist and Pischke, Chapter 5

•    David Card and Alan Krueger (1994). “Minimum wages and employment: a case study of the fast food industry in New Jersey and Pennsylvania.” American

Economic Review 84 (4): 772-793

Supplementary Reading:

•   Cunningham, Chapter 9

Week 11. Synthetic Control Analysis

The new method of synthetic control is useful for causal inference over time with a small number of units, particularly when the treatment occurs in only one unit. We’ll learn how to create  a  synthetic  control  case  to  compare  to  the  treated  unit,  based  on  an  optimal combination of untreated units. We’ll look at applications including the impact of tobacco control measures and German reunification.

Required Reading:

•   Alberto Abadie, Alexis Diamond and Jens Hainmueller (2015). “Comparative politics and the synthetic control method.” American Journal of Political Science 59 (2):

495-510

•   Cunningham, Chapter 10

Supplementary Reading:

•   Alberto Abadie, Alexis Diamond and Jens Hainmueller (2010). “Synthetic control

methods for comparative case studies: estimating the effect of California’s tobacco control program.” Journal of the American Statistical Association 105 (490): 493-

505