Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Econ 4550/6550

Fall 2023

Empirical Project 1

Where is the Land of Opportunity? Intergenerational Mobility in the United States

Due: Thursday, September 21st 2023 at the beginning of class

Introduction

Chetty et al. (2014) study intergenerational mobility in the United States using de-identified tax records on over 40 million children and their parents. In this problem set, you will explore several of the main findings from this paper.

Data 

The data files you will use in this problem set are on Canvas (Files à Empirical Project 1). You will want to download the data files from Canvas and create a folder on your computer where you will store the files associated with this problem set. You can use the change directory (cd) command in your do-file to tell Stata where to access these data files. See Documentation of Selected Variables at the end of this document for descriptions of some of the variables contained in these files that you will use in the problem set.

Coding Resources 

I have provided some coding hints in the questions below, but I anticipate that some of you will still have many questions on coding. There is built in functionality in Stata to help with syntax, simply type “help command”. For instance, typing “help generate” in Stata will bring up a help page for the command “generate”. Alternatively, the vast majority of coding issues can be solved with some quality Googling. Finally, I encourage you to use your classmates as well as my office hours to help with coding questions.  

Collaboration Policy

You are encouraged to consult with your classmates as you work on problem sets. However, after discussions with peers, you must write your own code and solutions to turn in. In addition, you must list the names of students with whom you have collaborated on problem sets in your do-file and in your Word or pdf document.

Submission Instructions

Please submit an electronic version of your problem set on Canvas and a physical copy of your problem set in-class. Your submission should include three files:

1. Answers to the questions below as a Word or pdf document

2. A do-file with your Stata code

3. A log file of your Stata output

Grading

Scored out of 25 points (5 points per question). To obtain full credit on questions 1-4, you must show evidence that you have written Stata code to produce the answers you report. Chetty et al. (2014) may be a helpful guide to “check your work” on several of the questions.

Questions

1) Exploring Measures of Upward and Downward Mobility. Using a version of Online Data Table 6 (“onlinedata6_top100cz.dta”), create “Top 10” and “Bottom 10” lists among the 100 largest commuting zones (CZs) of the statistics described in (a) and (b). Report CZ names, state abbreviations, and the values of statistic.  

a. Upward mobility as measured by the probability of rising from the bottom quintile to the top quintile of the income distribution P(Child in Q5|Parent in Q1). [Stata hints: use the sort command and the list command to display your “Bottom 10” list. The command gsort -X can be used to sort a dataset in descending of the variable X.]

b. Downward mobility as measured by the probability of falling out of the top quintile the income distribution, 1 – P(Child in Q5|Parent in Q5). [Stata hints: Use the command generate to create the downward mobility variable.]

c. Compute the correlation between the measures in (a) and (b). Are areas with higher upward mobility for the poor worse for the rich? [Stata hint: Use the command correlate.]

2) Relative Mobility: The Rank-Rank Slope. Let us now explore a different measure of mobility, the rank-rank slope, the second measure of relative mobility discussed in Section II (p. 1561).  For this question you will use “Statistics_By_Parent_Income_Percentile.dta”.

a. First, replicate Figure IIA “Mean Child Income Rank vs. Parent Income Rank in the U.S.” (p. 1576). [Stata hints: First, use the command generate kid_fam_rank_percentile = kid_fam_rank*100. To create the figure, use the command twoway scatter kid_fam_rank_percentile par_bin. You will also use the lfit, xtitle, ytitle, title, and legend(off) commands. Use graph export to save your figure as a pdf, and insert this figure in your problem set answers document.]

b. State the value of the rank-rank slope and explain the interpretation of this coefficient estimate in one sentence. [Stata hint: use the command regress.]

c. The relationship between parent income rank and mean child income rank (the rank-rank slope) is linear. Explain conceptually what it means if the line is steeper or flatter.

3) Other Outcomes: College Attendance and Teen Birth. Next we’ll explore the relationship between parent income and two other outcomes for children: college attendance and teenage birth. For this question you will also use “Statistics_By_Parent_Income_Percentile.dta”.

a. Replicate Figure IVA (p. 1584), but only the college attendance rate line (no need to graph college quality rank). Report and interpret the college attendance slope. [Stata hints: Follow similar steps as in Questions 2a and 2b. First, you will want to generate a new variable, college_percent, with the command generate college_percent = college*100.]

b. Replicate Figure IVB, “Female Children’s Teenage Birth Rate vs. Parent Income Rank”. Report and interpret the teenage birth rate slope. [Stata hints: Follow similar steps as in Questions 2a and 2b. Again, you will want to generate a new variable, kid_teenbirth_givenf_percent, with the command generate kid_teenbirth_givenf_percent = kid_teenbirth_givenf*100.]

4) Correlations with Area Characteristics. Having gained some familiarity with the measures of mobility, let us now explore the association between upward mobility and area-level characteristics. To answer this question, we will use the “correlates.dta” dataset, which I have created for you as a merged file of Online Data Tables 6 and 8.

a. Construct a table showing correlations between upward mobility, P(Child in Q5|Parent in Q1), and the following variables: racial segregation, the Gini coefficient of income inequality, school expenditures per capita, fraction religious, and the fraction of single parents. Interpret each of these estimates qualitatively. [Stata hint: use the command correlate.]

b. How do you think college tuition rates (“tuition”) are associated with upward mobility? What do you find in the data (describe the sign and magnitude of the correlation)? Are you surprised or not? Provide an explanation for the pattern you find. [Stata hint: use the command correlate.]

c. Regress P(Child in Q5|Parent in Q1) on the fraction of African Americans (“cs_race_bla”) and interpret the slope estimate you get along with the 95% confidence interval. [Stata hint: run the command regress prob_p1_k5 cs_race_bla, robust.]

d. Regress P(Child in Q5|Parent in Q1) on the share of married households (“cs_married”) and interpret the slope estimate you get along with the 95% confidence interval.

e. Now regress P(Child in Q5|Parent in Q1) on both the share of African Americans (“cs_race_bla”) and the share of married households (“cs_married”) and interpret the coefficients you get.

5) Suggest a new testable hypothesis for differences in upward mobility. Please propose a new potential mechanism for spatial variation in upward mobility that is not considered in Section VI (p. 1603) of Chetty et al. (2014). How would you measure this mechanism quantitatively (i.e., what data would you need to construct a variable to explore this potential mechanism)? Do you think this variable would be positively or negatively correlated with upward mobility? Why?

References

Chetty, Raj, Nathaniel Hendren, Patrick Kline, and Emmanuel Saez. 2014. “Where Is the Land of Opportunity? The Geography of Intergenerational Mobility in the United States.” Quarterly Journal of Economics 29 (4): 1553–1623.

Documentation of Selected Variables

onlinedata6.dta 

Variable

Definition

cz

Commuting Zone ID

czname

Commuting Zone name

stateabbrv

State Abbreviation

prob_p1_k5

Probability of rising from the bottom quintile to the top quintile of the income distribution P(Child in Q5|Parent in Q1)

prob_p5_k5

Probability of remaining in the top quintile of the income distribution P(Child in Q5|Parent in Q1)

 

Statistics_By_Parent_Income_Percentile.dta

Variable

Definition

par_bin

Centile of parent family income

kid_fam_rank

Average child family income rank in parent bin

college

Share of children in parent bin ever attending college during age 18-21

kid_teenbirth_givenf

Share of children in parent bin ever claiming a dependent born during child’s ages 13-19

 

correlates.dta 

Variable

Definition

prob_p1_k5

Probability of rising from the bottom quintile to the top quintile of the income distribution P(Child in Q5|Parent in Q1)

cs_race_theil_2000

Theil index of racial segregation

gini

Gini coefficent

ccd_exp_tot

School expenditures per capita

rel_tot

Fraction religious

cs_fam_wkidsinglemom

Fraction of children with single mothers

tuition

College tuition

cs_race_bla

Fraction of African Americans

cs_married

Fraction of married households