APEC 3003 Final
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
APEC 3003
Final
May 6th , 2023
Instructions
This exam has 4 questions and one extra credit opportunity. Please answer all questions. You may use external resources that comply with the campus academic integrity policy to answer these questions but must state all responses in your own words. Please type your responses on a separate document, noting which question you are responding to, then upload your responses document onto Canvas.
Some of the parts – not all! – of this exam build on each other. If you come to a part of the exam and get stuck, I strongly recommend the following:
1. Read the rest of the parts of the exam, and consider if you can answer the question without the previous parts.
Answer all parts of the exam you can. For instance, just because 1(b) comes after 1(a) does not mean that you have to answer 1(a) in order to answer 1(b).
2. If you believe a part of a question depends on a previous question, you can still obtain partial credit on that part.
The strategy here is to say “suppose I had found x in the previous part. Then my answer would be ...” Show as much of what you know as possible.
For questions 1 through 3, you will work with replication data from a paper entitled “The Price of Political Opposition: Evidence from Venezuela ’s‘Maisanta ’.”1 The authors look at the effects of the publication of the names of those who had signed a (failed) recall petition against Chávez in 2003. Chávez won 59% of the vote in the recall election triggered by this petition, so remained in power. In 2004, a database of all registered voters was released, which identified voters that had signed the third of 3 recall petitions. This clearly identified political opponents of the Chávez regime, and made this information broadly available. The authors argue that political leanings were referenced for job applicants, as well as by friends and neighbors. This allowed Chávez’s regime to retaliate against political opponents, particularly as he had consolidated power so had no need to conciliate the opposition. We will examine if there’s evidence for political retribution showing up in earnings of opponents of the Chávez regime whose identities were revealed.
The data maisanta .rdata are available on Canvas in the same space as this final exam paper. You can find a description of the variables in the data below.
Variable name |
Class |
Description |
Range |
ingreso_wk |
numeric |
Log of annual income (measured in 2000 thousand bolívares) |
-5.069 to 11.579 |
maisanta |
numeric |
Identity as signer of 3rd round petition has been revealed |
0 to 1 |
female |
numeric |
Sex is female |
0 to 1 |
educ |
numeric |
Years of schooling |
0 to 18 |
year |
numeric |
Year of sample |
1997 to 2006 |
caracas |
numeric |
Person lives in Caracas |
0 to 1 |
edad |
numeric |
Age |
0 to 99 |
chavista |
numeric |
Signed for Chavez |
0 to 1 |
Question 1: Univariate OLS regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 points
(a) (5 points) Write down the linear model for a regression of an individual’s log wages per year on a constant and an indicator that that person’s participation in the third petition
to recall Chávez (the maisanta) was publicly available in that year. Be sure to include all relevant notation, including coefficients and subscripts.
(b) (5 points) Explain how the phrase “least squares” in Ordinary Least Squares (OLS) provides an intuitive explanation for the mechanism by which OLS produces estimates of coefficients in the model.
(c) (5 points) Estimate the regression you described in part 1(a). Report and interpret the coefficient you estimate for the maisanta on annual log wages.
Note: the left hand side variable is measured as the natural logarithm of bolívares2 per year. This is not a purely linear model!
(d) (10 points) What assumption(s) need(s) to be true for your estimate from 1(c) to provide an unbiased estimate of the effect of having your opposition to the Chávez regime known on earnings?
Do you think that this assumption is plausible? Why or why not?
(e) (5 points) Report a heteroskedasticity robust standard error for the estimate you
computed in part 1(c).
Using the heteroskedasticity robust standard error, construct and report a 95% confidence interval around the estimated coefficient on maisanta from part 1(c).
What does this confidence interval tell you about the statistical significance of the estimate?
(f) (5 points) What assumption are you no longer making about the structure of the error terms by using heteroskedasticity robust standard errors?
Explain informally what that assumption says about the structure of the error terms.
Question 2: Multivariate OLS regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 points
(a) (5 points) We ’ll expand the regression from 1(a) to include year fixed effects, an indicator that the individual is female, and years of education.
ln(earnings)it = α0 + α1maisantait+ α2 femalei + α3 educit + γt + eit Estimate the expanded model and report your results.
(b) (10 points) Interpret the coefficient on the maisanta coefficient from the results obtained in 2(a). Also, note how the interpretation of the coefficient changes compared to part 1(c).
(c) (5 points) Is the effect of having one’s opposition to Chávez revealed on earnings statistically significant? Why or why not?
(d) (10 points) Is the effect practically significant? Give your reasoning.
(e) (5 points) Suppose you added an indicator that the respondent was male (equal to 1 when the indicator for female is equal to zero, and equal to 0 when the indicator for female is 1) to this regression.
ln(earnings)it = α0 + α1maisantait+ α2 femalei + α3 educit + α4malei + γt + eit What will R report for the coefficient α4 on male? Why?
Question 3: Difference-in-differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 points
The authors of this study obtain their data by linking two data sources together. This link gives them a unique data set - they can observe which people in the earnings data are potentially exposed to political retribution. However, the earnings data come from a survey with a high attrition rate. That is, many people who start out in the survey sample stop responding to the survey over time. To maintain the sample size of the survey, the people who run the survey incorporate new respondents when others leave. This means that the structure of the survey data is a repeated cross-section rather than a panel.
(a) (5 points) Suppose that the authors had panel data and were able to reliably observe the same individuals for each year.
Then they could estimate the regression specification
ln(earnings)it = β0 + β1maisantait + 6i + τt + εit
where i indexes individuals and t indexes years.
How would you interpret the coefficient β 1 in this case?
(b) (5 points) What assumption is necessary for the coefficient β 1 to represent the causal effect of maisanta disclosure on earnings?
Informally describe what this assumption says about the data.
(c) (5 points) Can we ever test this assumption? Why or why not?
Question 4: Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 points
In case you hadn’t heard, job cuts at Disney mean that ABC has fired Nate Silver, who is behind the firm 538 that is a major player in election forecasting. Suppose Nate is starting a new company and you are interviewing with them.
Someone didn’t do a good job paying attention to who owns what intellectual property at 538, and Nate still owns the simulation code shown on the last page of the exam.
This code is used to test how sensitive their predictions are to the relationship between independent vote share and predicted republican vote share.
Your interviewer asks you the following questions to assess your skill at interpreting code and understanding the underlying statistics.
(a) (5 points) What is the purpose of the command set .seed(2168) on line 1?
(b) (5 points) There will be 1,000 observed values of the t statistic in the data frame null at the end of the simulation.
For how many of those 1,000 observed t statistics do you expect the absolute value to be more than 1.96?
Why is this your expectation? And what line(s) of code makes you think this?
(c) (5 points) What do you expect the average value of betahat to be across all observations in the data frame notnull?
Why is this your expectation? And what line(s) of code makes you think this?
Extra credit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 points
Make a meme about econometrics.
You can’t use one that already exists.
Explain to me, as if I am your less econometrically savvy relative, why the meme is funny.
set .seed(2168) load(observedvoteshares .rdata) sim <- 1000 betanull <- 0 betanotnull <- 0 .1 #make a place to put results to compare them null <- data .frame(iteration = 1 :sim , betahat = numeric(sim) , tstat = numeric(sim) , prederror = numeric(sim)) notnull <- data .frame(iteration = 1 :sim , betahat = numeric(sim) , tstat = numeric(sim) , prederror = numeric(sim)) for (s in 1 :sim){ #generate outcome under null sharernull <- 0 .05 + 0 .3*observedvoteshares$incumbr + 0 .9*observedvoteshares$prevsharer + betanull*observedvoteshares$prevsharei - 0 .95*observedvoteshares$prevshared + rnorm(length(observedvoteshares) , 0 , 0 .25) #generate outcome under not null sharernotnull <- 0 .05 + 0 .3*observedvoteshares$incumbr + 0 .9*observedvoteshares$prevsharer + betanotnull*observedvoteshares$prevsharei - 0 .95*observedvoteshares$prevshared + rnorm(length(observedvoteshares) , 0 , 0 .25) # Run OLS regression for null nullmodel <- lm(sharenull ~ incumbr + prevsharer + prevsharei + prevshared , data = observedvoteshares) # Store null coefficient in results data frame null[s , "betahat"] <- coef(summary(model)) [2 , "Estimate"] null[s , "tstat"] <- coef(summary(model)) [2 , "t value"] # compare predicted values under null to observed shares in 2020 null[s , "prederror"] <- mean(predict(nullmodel) - obssharer) # Run OLS regression for not null notnullmodel <- lm(sharenotnull ~ incumbr + prevsharer + prevsharei + prevshared , data = observedvoteshares) # Store not null coefficient in results data frame notnull[s , "betahat"] <- coef(summary(model)) [2 , "Estimate"] notnull[s , "tstat"] <- coef(summary(model)) [2 , "t value"] # |
2023-05-04