POLI3148 Homework 1
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
POLI3148 Homework 1
Introduction
Background: Fading American Dream
While people still talked about “American Dream” that everyone can attain success and wealth through perseverance and diligence, the “burned out” young people may feel differently, as hard work does not lead to success. Chetty, an economist, refers to this phenomenon as the “fading American dream” in a paper published in Science, implying that young people are unable to earn as much as previous generations.
This assignment aims to enhance your data wrangling and data visualization abilities through the use of data from Chetty’s paper. Moreover, we seek to gain insight into the concept of the “fading American Dream” by analyzing the data.
You should use RMarkdown to do this assignment. You can do it on either Posit Cloud or your local environment. To do it on Posit Cloud, please open the Assignment Project named Homework One on Posit Cloud. You can also download the file in your own computer. This assignment is due March 24 at 5pm.
Question One
Data Description
For the first question, you should run following codes to get started:
library(tidyverse)
url_main <- "https://raw .githubusercontent .com/sscihz/DaSPPA/main/"
url_file <- "Fading_AmericanDream/ForAssignment/Data/Q1 .csv"
abs_mobility <- read_csv (paste0(url_main, url_file))
This table reports the probability that a child earns more than their parents at age 30 by parent income
percentile for each child birth cohort in the range 1940-1984. All estimates are conditional on positive parent income. The table also reports the fraction of parents with zero income for each child birth cohort and the probability that a child in a given birth cohort earns more than their parents.
cohort is the child birth cohort.
base_absmob_pos_par[percentile] is the probability that a child at a given parent income percentile earns more than their parents.
par_frac0 is the fraction of parents with zero income.
Questions
Q1.1 (1 point) Draw a line plot that shows the mobility of various generations (1940,1905,1960,1970,1980) where x axis refers to the parent income percentile and y axis refers to probability of a child earning more than their parents. Describe your findings.
Hint:
● You should transform the wide data to a long data.
● The information about income percentile are in the variable name. You can use as .numeric(str_extract(par, "[0-9]+")) to extract the percentile information.
Q1.2 (1 point) Draw a line plot that shows the trend in mean mobility over time where x axis refers to child’s birth cohort and y axis refers to probability of a child earning more than their parents. Describe your findings.
Hint: Your final output should look like:
Question Two
Data Description
The data of first question is tidy and only contains relevant information. Nevertheless, in the field of data science, we are often confronted with untidy data, requiring data wrangling skills.
In question two, we will be working with US census data in this session, which is the raw data for the first question. Due to the size of the data, we’ll focus on a sample of census data:
url_main <- "https://raw .githubusercontent .com/sscihz/DaSPPA/main/"
url_file <- "Fading_AmericanDream/ForAssignment/Data/Census_sample .csv"
usa_census_sub <- read_csv (paste0(url_main, url_file))
The data description is in appendix section. Please read the description before you solve this question.
Please Note that the each year in the question two only refers to 1960, 1970, 1980, 1990, 2000, 2010.
Questions
Q2.1 (1 point) Describe the data set:
● How many observations and variables are there in this data set?
● Select YEAR, GENDER, INCTOT,INCWAGE,FTOTINC and report summary statistics of these variables.
Q2.2 (1 point) Select INCWAGE,INCTOT,FTOTINC which represent wage and salary income,total personal income and total family income, plot the distribution of each variable in each year. Describe the distribution of each variable.
Q2.3 (2 points) The density plot in Q2.2 needs to be improved for four reasons:
● The census usually use an incredibly huge number to donate missing value
● Scholars usually log the income when plot the distribution
● There are many people who don’t have income in this data set
● Inflation over years.
Please plot the distribution of INCWAGE again by addressing problems above. In addition, add three vertical lines indicating 25%, 50%, 75% of the distribution.
Hint:
● The missing value indicators are: incwage = 999999, inctot = 9999999, ftotinc = 9999999 | ftotinc = 9999998.
● Real Wage in year1 = * CPI in yearbase ; let’s use year 2000 as yearbase in
● You can use following code to get CPI in each year:
CPI <- read_csv ("https://raw .githubusercontent .com/sscihz/DaSPPA/main/
Fading_AmericanDream/ForAssignment/Data/CPI .csv")
Q2.4 (2 points) Examine the relationship between variables:
● Draw a plot that shows the relationship between age and income in every year. Describe the patterns you find.
● Draw a plot that shows the difference of income distribution of gender in every year. Then draw another plot that shows the relationship between age, gender and income. Describe the patterns you find.
Question Three
The variable SERIAL is an unique identifier of an household. That is, if two individuals have the same SERIAL, they are in the same household.
Q3.1 (0.5 point) Use the SERIAL to calculate what percent of individuals have family members.
Q3.2 (1.5 points) Check the variable description table in last page, find out another variable and use it to calculate what percent of individuals have family members. Is the result the same as that of Q3.2? Why are the two results different?
Question Four (Bonus 2 points)
Explore a new data set that may fix the problem with the previous data set, and report your findings (less than 300 words). You should include appropriate graphs and tables to demonstrate your findings. There are two possible topics to consider:
● Calculate the Gini coefficient. In addition, how the coefficients are related to GDP or unemployment?
● Explore the pattern of household income distribution. In what percentage of households do women earn more than men? Does the pattern change over time?
## There is another larger one:
## https://www.dropbox.com/s/fclh8s185tv9lem/Census_sample_two.csv?dl=0
## If you are interested in it, you can download it in your own computer .
url_main <- "https://raw .githubusercontent .com/sscihz/DaSPPA/main/" url_census_sub2 <- "Fading_AmericanDream/ForAssignment/Data/Census_sample_two .csv" url_gdp <- "Fading_AmericanDream/ForAssignment/Data/gdp .csv"
url_unemployment <- "Fading_AmericanDream/ForAssignment/Data/unemployment .csv"
usa_census_sub2 <- read_csv (paste0(url_main, url_census_sub2))
gdp <- read_csv (paste0(url_main, url_gdp))
unemployment <- read_csv (paste0(url_main, url_unemployment))
Submission Instruction
You are expected to use RMarkdown to do this assignment. To submit your assignment, you should:
● Compile your RMarkdown into pdf. Upload both RMarkdown and PDF to Moodle.
● Print the pdf file out, properly organize them, and submit them as one stapled document to the General Office, Department of Politics & Public Administration, Room 963, The Jockey Club Tower.
Appendix
Varibale Description Table
Variable Description
YEAR |
Census year |
SAMPLE |
IPUMS sample identifier |
SERIAL |
Household serial number |
CBSERIAL |
Original Census Bureau household serial number |
HHWT |
Household weight |
CLUSTER |
Household cluster for variance estimation |
STATEFIP |
State (FIPS code) |
STRATA |
Household strata for variance estimation |
GQ |
Group quarters status |
GQTYPE |
Group quarters type [general version] |
GQTYPED |
Group quarters type [detailed version] |
PERNUM |
Person number in sample unit |
PERWT |
Person weight |
SELFWTSL |
Self-weighting sample-line person |
FAMUNIT |
Family unit membership |
FAMSIZE |
Number of own family members in household |
MOMLOC |
Mother’s location in the household |
POPLOC |
Father’s location in the household |
SPLOC |
Spouse’s location in household |
NCHILD |
Number of own children in the household |
SEX |
Sex |
AGE |
Age |
MARST |
Marital status |
RACE |
Race [general version] |
RACED |
Race [detailed version] |
BPL |
Birthplace [general version] |
BPLD |
Birthplace [detailed version] |
OCC1950 |
Occupation, 1950 basis |
Variable |
Description |
INCTOT |
Total personal income |
FTOTINC |
Total family income |
INCWAGE |
Wage and salary income |
INCNONWG |
Had non-wage/salary income over $50 |
INCWELFR |
Welfare (public assistance) income |
INCSUPP |
Supplementary Security Income |
2023-03-15