闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

POLI3148 Homework 1

Introduction

Background: Fading American Dream

While people still talked about “American Dream” that everyone can attain success and wealth through perseverance and diligence, the “burned out” young people may feel diﬀerently, as hard work does not lead to success. Chetty, an economist, refers to this phenomenon as the “fading American dream” in a paper published in Science, implying that young people are unable to earn as much as previous generations.

This assignment aims to enhance your data wrangling and data visualization abilities through the use of data from Chetty’s paper. Moreover, we seek to gain insight into the concept of the “fading American Dream” by analyzing the data.

You should use RMarkdown to do this assignment. You can do it on either Posit Cloud or your local environment. To do it on Posit Cloud, please open the Assignment Project named Homework One on Posit Cloud. You can also download the ﬁle in your own computer. This assignment is due March 24 at 5pm.

Question One

Data Description

For the ﬁrst question, you should run following codes to get started:

library(tidyverse)

url_main <- "https://raw .githubusercontent .com/sscihz/DaSPPA/main/"

url_file <- "Fading_AmericanDream/ForAssignment/Data/Q1 .csv"

abs_mobility <- read_csv (paste0(url_main, url_file))

This table reports the probability that a child earns more than their parents at age 30 by parent income

percentile for each child birth cohort in the range 1940-1984. All estimates are conditional on positive parent income. The table also reports the fraction of parents with zero income for each child birth cohort and the probability that a child in a given birth cohort earns more than their parents.

cohort is the child birth cohort.

base_absmob_pos_par[percentile] is the probability that a child at a given parent income percentile earns more than their parents.

par_frac0 is the fraction of parents with zero income.

Questions

Q1.1 (1 point) Draw a line plot that shows the mobility of various generations (1940,1905,1960,1970,1980) where x axis refers to the parent income percentile and y axis refers to probability of a child earning more than their parents. Describe your ﬁndings.

Hint:

● You should transform the wide data to a long data.

● The information about income percentile are in the variable name. You can use as .numeric(str_extract(par, "[0-9]+")) to extract the percentile information.

Q1.2 (1 point) Draw a line plot that shows the trend in mean mobility over time where x axis refers to child’s birth cohort and y axis refers to probability of a child earning more than their parents. Describe your ﬁndings.

Hint: Your ﬁnal output should look like:

Question Two

Data Description

The data of ﬁrst question is tidy and only contains relevant information. Nevertheless, in the ﬁeld of data science, we are often confronted with untidy data, requiring data wrangling skills.

In question two, we will be working with US census data in this session, which is the raw data for the ﬁrst question. Due to the size of the data, we’ll focus on a sample of census data:

url_main <- "https://raw .githubusercontent .com/sscihz/DaSPPA/main/"

url_file <- "Fading_AmericanDream/ForAssignment/Data/Census_sample .csv"

usa_census_sub <- read_csv (paste0(url_main, url_file))

The data description is in appendix section. Please read the description before you solve this question.

Please Note that the each year in the question two only refers to 1960, 1970, 1980, 1990, 2000, 2010.

Questions

Q2.1 (1 point) Describe the data set:

● How many observations and variables are there in this data set?

● Select YEAR, GENDER, INCTOT,INCWAGE,FTOTINC and report summary statistics of these variables.

Q2.2 (1 point) Select INCWAGE,INCTOT,FTOTINC which represent wage and salary income,total personal income and total family income, plot the distribution of each variable in each year. Describe the distribution of each variable.

Q2.3 (2 points) The density plot in Q2.2 needs to be improved for four reasons:

● The census usually use an incredibly huge number to donate missing value

● Scholars usually log the income when plot the distribution

● There are many people who don’t have income in this data set

● Inﬂation over years.

Please plot the distribution of INCWAGE again by addressing problems above. In addition, add three vertical lines indicating 25%, 50%, 75% of the distribution.

Hint:

● The missing value indicators are: incwage = 999999, inctot = 9999999, ftotinc = 9999999 | ftotinc = 9999998.

● Real Wage in year1 = * CPI in yearbase ; let’s use year 2000 as yearbase in

● You can use following code to get CPI in each year:

CPI <- read_csv ("https://raw .githubusercontent .com/sscihz/DaSPPA/main/

Fading_AmericanDream/ForAssignment/Data/CPI .csv")

Q2.4 (2 points) Examine the relationship between variables:

● Draw a plot that shows the relationship between age and income in every year. Describe the patterns you ﬁnd.

● Draw a plot that shows the diﬀerence of income distribution of gender in every year. Then draw another plot that shows the relationship between age, gender and income. Describe the patterns you ﬁnd.

Question Three

The variable SERIAL is an unique identiﬁer of an household. That is, if two individuals have the same SERIAL, they are in the same household.

Q3.1 (0.5 point) Use the SERIAL to calculate what percent of individuals have family members.

Q3.2 (1.5 points) Check the variable description table in last page, ﬁnd out another variable and use it to calculate what percent of individuals have family members. Is the result the same as that of Q3.2? Why are the two results diﬀerent?

Question Four (Bonus 2 points)

Explore a new data set that may ﬁx the problem with the previous data set, and report your ﬁndings (less than 300 words). You should include appropriate graphs and tables to demonstrate your ﬁndings. There are two possible topics to consider:

● Calculate the Gini coeﬃcient. In addition, how the coeﬃcients are related to GDP or unemployment?

● Explore the pattern of household income distribution. In what percentage of households do women earn more than men? Does the pattern change over time?

## There is another larger one:

## https://www.dropbox.com/s/fclh8s185tv9lem/Census_sample_two.csv?dl=0

## If you are interested in it, you can download it in your own computer .

url_main <- "https://raw .githubusercontent .com/sscihz/DaSPPA/main/" url_census_sub2 <- "Fading_AmericanDream/ForAssignment/Data/Census_sample_two .csv" url_gdp <- "Fading_AmericanDream/ForAssignment/Data/gdp .csv"

url_unemployment <- "Fading_AmericanDream/ForAssignment/Data/unemployment .csv"

usa_census_sub2 <- read_csv (paste0(url_main, url_census_sub2))

gdp <- read_csv (paste0(url_main, url_gdp))

unemployment <- read_csv (paste0(url_main, url_unemployment))

Submission Instruction

You are expected to use RMarkdown to do this assignment. To submit your assignment, you should:

● Compile your RMarkdown into pdf. Upload both RMarkdown and PDF to Moodle.

● Print the pdf ﬁle out, properly organize them, and submit them as one stapled document to the General Oﬃce, Department of Politics & Public Administration, Room 963, The Jockey Club Tower.

Appendix

Varibale Description Table

Variable Description

YEAR	Census year
SAMPLE	IPUMS sample identiﬁer
SERIAL	Household serial number
CBSERIAL	Original Census Bureau household serial number
HHWT	Household weight
CLUSTER	Household cluster for variance estimation
STATEFIP	State (FIPS code)
STRATA	Household strata for variance estimation
GQ	Group quarters status
GQTYPE	Group quarters type [general version]
GQTYPED	Group quarters type [detailed version]
PERNUM	Person number in sample unit
PERWT	Person weight
SELFWTSL	Self-weighting sample-line person
FAMUNIT	Family unit membership
FAMSIZE	Number of own family members in household
MOMLOC	Mother’s location in the household
POPLOC	Father’s location in the household
SPLOC	Spouse’s location in household
NCHILD	Number of own children in the household
SEX	Sex
AGE	Age
MARST	Marital status
RACE	Race [general version]
RACED	Race [detailed version]
BPL	Birthplace [general version]
BPLD	Birthplace [detailed version]
OCC1950	Occupation, 1950 basis

Variable	Description
INCTOT	Total personal income
FTOTINC	Total family income
INCWAGE	Wage and salary income
INCNONWG	Had non-wage/salary income over $50
INCWELFR	Welfare (public assistance) income
INCSUPP	Supplementary Security Income