DS-UA 201 Causal Inference Final Exam 2022
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
DS-UA 201
Causal Inference
August 18, 2022
Final Exam-100 points
Instructions
You should submit your writeup (as a knitted .pdf along with the accompanying .rmd file) to the course website before 11:59pm EST on Saturday, August 20th. Please upload your solutions as a .pdf file saved as Yourlastname Yourfirstname final .pdf. In addition, an electronic copy of your .rmd file (saved as Yourlastname Yourfirstname final .rmd) should accompany this submis- sion.
Late finals will not be accepted, so start early and plan to finish early. Remember that exams often take longer to finish than you might expect.
This exam has 5 questions and is worth a total of 100 points . Show your work in order to receive partial credit. Also, we will not accept un- compiled .rmd files .
In general, you will receive points (partial credit is possible) when you demonstrate knowledge about the questions we have asked, you will not receive points when you demonstrate knowledge about ques- tions we have not asked, and you will lose points when you make inaccurate statements (whether or not they relate to the question asked) . Be careful, however, that you provide an answer to all parts of each question.
You may use your notes, books, and internet resources to answer the questions below. However, you are to work on the exam by yourself. You are prohibited from corresponding with any human being regarding the exam (unless following the procedures below) .
We will answer clarifying questions during the exam. We will not answer statistical or compu- tational questions until after the exam is over. If you have a question, post it on Campuswire as a private post, visible only to the instructional staff. If your question is a clarifying one, we will reply.
Problem 1 (20 points)
This problem will have you replicate and analyze the results from Moser and Voena’s 2012 AER paper on the impact of the World War I Trading with the Enemy Act on U.S. domestic invention. The full citation is below
Moser, P., & Voena, A. (2012). Compulsory licensing: Evidence from the trading with the enemy act. American Economic Review, 102(1), 396-427.
The premise of the study is to evaluate the effect that ”compulsory licensing” policy - that is, policies that permit domestic firms to violate foreign patents and produce foreign inventions without needing to obtain a license from the owner of the foreign patent - have on domestic invention. Does access to foreign inventions make domestic firms more innovative? The authors leverage an exogenous event in U.S. licensing policy that arose from World War I - the 1917 ”Trading with the Enemy Act” (TWEA) which permitted U.S. firms to violate patents owned by enemy-country firms. This had the consequence of effectively licensing all patents from German-owned firms to U.S. firms after
1918 (that is, from 1919 onward), allowing them to produce these inventions without paying for a license from the German-owned company.
The authors look specifically at domestic innovation and patent activity in the organic chemicals sector. They note that only some of the sub-classes of organic chemicals (as defined by the US Patent Office) received any compulsory licenses under the Trading with the Enemy Act while others did not.
They leverage this variation in exposure to the ”treatment”of compulsory licensing to implement a differences-in-differences design looking at domestic firm patent activity in each of these sub-classes (comparing sub-classes that were exposed to compulsory licensing to those that were unexposed). The unit of the dataset is the sub-class/year (471,120 observations) of 7248 US Patent and Trade- mark Office (USPTO) patent sub-classes over 65 years.
The dataset is patents .csv and the relevant variables are:
• uspto class - USPTO Patent Sub-Class (unit)
• grntyr - Year of observation (year)
• count usa - Count of patents granted to US-owned firms in the year
• count for - Count of patents granted to foreign-owned (non-US) firms in the year
• treat - Treatment indicator - Whether the patent sub-class received any German patents under TWEA (after 1918 when the policy went into effect) (Note that this is not an indicator for the overall treatment group (whether the unit ever received treatment) - it is only 1 after
1918 for units that receive treatment but is still 0 for those ”treated” units prior to the initiation of treatment)
Question A ( 5 points)
If you try to use a two-way fixed effects estimator on the dataset as it is, it will likely freeze up your computer as this is a very large dataset. We’ll instead first aggregate the data in a way that will let you use a simple difference-in-differences estimator to estimate the treatment effect.
Generate a point estimate for the average treatment effect of receiving treatment on the average annual count of US patents using a difference-in-differences estimator (using all post-treatment (1919-1939) and pretreatment (1875-1918) time periods. You should aggregate your data such that the outcome is the post-/pre-difference in the outcome (preferably using tidyverse functions like group by and summarize) and each row is a USPTO patent sub-class (rather than a sub-class/year observation) and use a difference-in-means estimator with the differenced outcome. Again, if you use 1m robust or even 1m with two-way fixed effects, your computer will likely freeze up as there are many FE parameters to estimate.
Provide a 95% robust confidence interval and interpret your point estimate. Do we reject the null of no treatment effect at the α = .05 level?
Question B (5 points)
A colleague suggests that you should instead just compare the average differences in the count of US patents in the post-1918 period between exposed and unexposed sub-classes to estimate the treatment effect. Based on what we observe in the pre-1919 period, is ignorability of the treatment likely to hold under this strategy? Discuss why or why not - what do you observe in the patent counts in the pre-treatment period between exposed and unexposed subclasses?
Question C (5 points)
We might be concerned that there are differential trends in pre-treatment patenting between those sub-classes exposed to the treatment and those exposed to control. Estimate the difference in the trend in US patents between exposed and unexposed sub-classes from 1918 to 1917 , 1916, 1915, and 1914 (four estimates in total: 1918-1917, 1918-1916, 1918-1915, 1918-1914). Provide a 95% robust confidence interval for each of these estimates and interpret your results. Do we reject the null that any of these differ from 0 (at α = .05) ? If the outcome trends were evolving in parallel, what would we expect these estimates to be? What do your results suggest for the validity of the parallel trends assumption?
Question D (5 points)
The authors adjust for covariates in addition to their out of concern for possible parallel trends violations. One possible confounder that might be driving a parallel trends violation is the overall amount of foreign patenting in the sub-class and its change over time - reflecting general techno- logical differences that might differ between the patent sub-classes. Since the treatment does not affect the amount of foreign patenting, this is a valid control.
Create a variable for the change between the post- and pre-treatment count of foreign patents in the USPTO subclass. Bin this variable into six (6) roughly-equally sized strata and estimate the effect of the treatment on US patenting (again using the differenced outcome) using a stratified difference-in-means estimator. Provide a robust 95% confidence interval and interpret your results. Do we reject the null of no treatment effect at the α = .05 level? Compare your results to your estimate from Question A and discuss why they might differ.
Problem 2 (25 points)
In this problem you will be analyzing a dataset from a 2011 paper by Carpenter and Dobkin. The full citation for the paper is:
Carpenter, C., & Dobkin, C. (2011). The minimum legal drinking age and public health. Journal of Economic Perspectives, 25(2), 133-56.
This paper examines evidence linking the legal alcohol drinking age in the US (21) to increased likelihood of accidents, hospitalization, and health hazards in general. The main identification strategy employed by the authors is a sharp Regression Discontinuity Design (RDD), where age is the running variable, and 21 is the cutoff.
The dataset contains 80 observations, where each unit is an age group, and values are collected over 4 US states.
The dataset is ER .csv and it contains five variables:
• age - The age of the unit, where the decimal indicates month of the year
• all - The total number of ER admissions
• injury - The total number of ER admissions due to injury
• illness - The total number of ER admissions due to viral illness
• alcohol - An adjusted index of how many ER admissions were linked to alcohol consumption
Question A (10 points)
Estimate the effect of being legally able to purchase alcohol (age ≥ 21 ) on the all, injury, and alcohol variables using an RDD with bandwidth = 1. For each of the three outcomes report point estimates, standard errors, and 95% confidence intervals. Repeat the analysis for bandwidth = 0.5 years, and bandwidth = 2 years. Discuss and interpret your results. Which outcome variable seems to be associated with the largest effect? Does bandwidth selection influence results?
Question B (5 points)
Using the entire dataset, create and show RDD plots that visualize the discontinuity for each of the three outcome variables used in Question A. The plots should display both points and regression lines.
Question C (10 points)
Conduct a placebo RDD analysis using the illness variable as outcome: since viral illnesses are not caused by alcohol consumption, we have no reason to expect that being legally able to drink will have an effect on this variable. Report both RDD estimates with standard errors and 95% CIs, and make a RDD plot for this outcome variable. Is there a treatment effect and is it statistically significant? What does this suggest about the plausibility of the RDD assumptions?
Problem 3 (20 points)
In this problem, you will examine whether family income affects an individual’s likelihood to enroll in college by analyzing a survey of approximately 4739 high school seniors that was conducted in 1980 with a follow-up survey taken in 1986.
This dataset is based on a dataset from
Rouse, Cecilia Elena. Democratization or diversion? The effect of community colleges on educa- tional attainment. Journal of Business & Economic Statistics 13, no. 2 (1995): 217-224.
The dataset is college .csv and it contains the following variables:
• college Indicator for whether an individual attended college. (Outcome)
• income Is the family income above USD 25,000 per year (Treatment)
• distance distance from 4-year college (in 10s of miles).
• score These are achievement tests given to high school seniors in the sample in 1980 .
• fcollege Is the father a college graduate?
• tuition Average state 4-year college tuition (in 1000 USD).
• wage State hourly wage in manufacturing in 1980.
• urban Does the family live in an urban area?
Question A (5 points)
Draw a DAG of the variables included in the dataset, and explain why you think arrows between variables are present or absent. You can use any tool you want to create an image of your DAG, but make sure you embed it on your compiled .pdf file. Assuming that there are no unobserved confounders, what variables should you condition on in order to estimate the effect of the treatment on the outcome, according to the DAG you drew?
Question B (5 points)
Choose one among the methodologies for ATE estimation under conditional ignorability that we have covered in class to apply to this dataset. Explain why you made your choice, and discuss the assumptions that are needed to apply your method of choice to this dataset. State if and why you think these assumptions hold in this dataset. In addition, choose a method to compute variance estimates for the estimator you chose, and discuss the reasons behind your choice in the context of this dataset.
Question C (10 points)
Using the methodology you chose in Question B to control for the confounders you have selected in Question A, as well as the relevant R packages, provide your estimate of the ATE of the treatment on the outcome. Using your variance estimator of choice, report standard errors and 95% confidence intervals around your estimates. Interpret your results and discuss both their statistical significance and their substantive implications.
Variable |
Description |
shareself |
proportion of self-employed potential voters |
shareblue |
proportion of blue-collar potential voters |
sharewhite |
proportion of white-collar potential voters |
sharedomestic |
proportion of domestically employed potential voters |
shareunemployed |
proportion of unemployed potential voters |
nvoter |
number of eligible voters |
nazivote |
number of votes for Nazis |
Table 1: 1932 German Election Data.
Problem 4 (25 points)
Who voted for the Nazis? Researchers attempted to answer this question by analyzing aggregate election data from the 1932 German election during the Weimar Republic. This question is based on the following article:
G. King, O. Rosen, M. Tanner, A. F. Wagner(2008). Ordinary economic voting behavior in the extraordinary election of Adolf Hitler. Journal of Economic History, vol. 68, pp. 951-996. 2008
We analyze a simplified version of the election outcome data, which records, for each precinct, the number of eligible voters as well as the number of votes for the Nazi party. In addition, the data set contains the aggregate occupation statistics for each precinct. Table 1 presents the variable names and descriptions of the CSV data file nazis .csv. Each observation represents a German precinct.
The goal of the analysis is to investigate which types of voters (based on their occupation category) cast ballots for the Nazi party in 1932. One hypothesis says that the Nazis received much support from blue-collar workers. Since the data do not directly tell us how many blue-collar workers voted for the Nazis, we must infer this information using a statistical analysis with certain assumptions. Such an analysis, where researchers try to infer individual behaviors from aggregate data, is called ecological inference.
To think about ecological inference more carefully in this context, consider the following simplified table for each precinct i.
The data at hand tells us only the proportion of blue-collar voters Xi and the vote share for the Nazis Yi in each precinct, but we would like to know the Nazi vote share among the blue-collar voters Wi1 and among the non-blue-collar voters Wi2 . Then, there is a deterministic relationship
between X,Y , and {W1,W2 }. Indeed, for each precinct i, we can express the overall Nazi vote share as the weighted average of the Nazi vote share of each occupation:
Yi = XiWi1 + (1 − Xi) Wi2 (1)
We exploit the linear relationship between the Nazi vote share Yi and the proportion of blue-collar voters Xi given in equation (1) by regressing the former on the latter. That is, fit the following linear regression model:
Compute the estimated slope coefficient, its standard error, and the 95% confidence interval. Give a substantive interpretation of each quantity.
Question B ( 5 points)
Based on the fitted regression model from the previous question, predict the average Nazi vote share Yi given various proportions of blue-collar voters Xi . Specifically, plot the predicted value of Yi (the vertical axis) against various values of Xi within its observed range (the horizontal axis) as a solid line. Add 95% confidence intervals as dashed lines. Give a substantive interpretation of the plot.
Question C ( 5 points)
Fit the following alternative linear regression model:
E(Yi ∣ Xi) = α Xi+ (1* − Xi) β* . (3)
Note that this model does not have an intercept. How should one interpret α* and β* ? How are these parameters related to the linear regression model given in equation (2)?
Question D ( 5 points)
Fit a linear regression model where the overall Nazi vote share is regressed on the proportion of each occupation. The model should contain no intercept and 5 predictors, each representing the proportion of a certain occupation type. Interpret the estimate of each coefficient and its 95% confidence interval. What assumption is necessary to permit your interpretation?
Question E (bonus 5 points)
Finally, we consider a model-free approach to ecological inference. That is, we ask how much we can learn from the data alone without making an additional modeling assumption. Given the relationship in equation (1), for each precinct, obtain the smallest value that is logically possible for
Wi1 by considering the scenario in which all non-blue-collar voters in precinct i vote for the Nazis. Express this value as a function of Xi and Yi . Similarly, what is the largest possible value for Wi1 ? Calculate these bounds, keeping in mind that the value for Wi1 cannot be negative or greater than 1. Finally, compute the bounds for the nationwide proportion of blue-collar voters who voted for the Nazis (i.e., combining the blue-collar voters from all precincts by computing their weighted average based on the number of blue-collar voters). Give a brief substantive interpretation of the results.
Problem 5 (20 points)
In this problem we are interested in the question of whether an extra year of education causes increased wages. We will use wage2 .csv data used in the following paper:
M. Blackburn and D. Neumark (1992), Unobserved Ability, Efficiency Wages, and Interindustry Wage Differentials, Quarterly Journal of Economics 107, 1421-1436. https://doi.org/10.3386/w3857 This dataset includes a bunch of different variables. These are the variables we care about for this
question:
Variable name
wage
educ
feduc
meduc
Question A (4 points)
1. (2 points) First create a naive model of the relationship between years of education and wage, by treating wage as a dependent variable. What is the effect of education on wage as predicted by the model?
2. (2 points) Does this naive model correctly estimates the effect of education on wages? Why or why not?
Question B (6 points)
We next want to apply 2SLS (two-stage least squares) approach to account for unobservable con- founders. For this we need to identify an instrument variable (IV). Remember, for an instrument to be valid, it should meet these criteria:
1. Relevance: Instrument is correlated with policy variable
2. Exclusion: Instrument is correlated with outcome only through the policy variable
3. Exogeneity: Instrument isn’t correlated with anything else in the model (i.e. omitted variables)
We have two choices for the instrument variable: years of education of father and years of education of mother. Select one of them and show using the data that it satisfies all the three criteria to be an instrument variable.
Question C (5 points)
Using the IV identified in Question B, you will next perform 2SLS manually.
1. (2 points) In the first stage, predict education using the IV.
2. (3 points) In the second stage, use the predicted education to estimate the exogenous effect of education on wages. What is the causal effect of education on wage using this 2SLS analysis?
Question D (5 points)
Now using iv robust() perform 2SLS in a single step.
1. (2 points) Compare the exogenous effect of education on wages predicted by the manual 2SLS from Question C with the effect obtained using iv robust(). Are they same or different?
2. (3 points) Next compare the standard errors of the effect estimate from the two approaches. Which approach correctly estimates the standard error and why?
2022-08-20