关键词 > EMET3006/4301/8001
EMET3006/4301/8001 Applied Micro Econometrics Semester 2, 2022
发布时间:2022-09-24
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Applied Micro Econometrics, EMET3006/4301/8001
Semester 2, 2022
Assignment
Due Date Monday October 10 10am
In this assignment you will download data from Wattle. The data are based on a simplified version of Longitudinal Surveys of Australian Youth (LSAY). The first data set is from Wave 1 of the LSAY, a survey of Year 9 students in 1995. The second data set contains subsequent years. The third contains a new cohort, from 2009.
Type your answers in an Rmarkdown document which you upload when you are done.Please also upload your html/pdf document with your output and answers.
You are welcome to discuss the questions with classmates, please do up- load your own work however.
I’ve provided some basic commands but you may want to look some other things up on google.
Questions:
1. Begin with LSAYwave1.dta.
(a) How many year 9 students are surveyed by LSAY in 1995 accord- ing to the data? [The data contain the full set of year 9 students surveyed by LSAY.]
(b) What is the share of male students?
(c) What is the share of students born outside Australia?
(d) What is the average mathematics score for students born outside Australia? What is the average mathematics score for other stu- dents?
(e) What is the distribution of female students’ month of birth? A posible code is:
LSAYwave1$dob <- format(as .Date(LSAYwave1$dob, format = "%m/%d/%y"),"%Y-%m-%d" )
LSAYwave1 <- LSAYwave1 %>%
mutate(month = month(dob),
yob = yob(dob))
(f) Do students whose mothers’ level of education is “3. all years of secondary” have better mathematics and reading scores than those whose mothers’ level of education is “2. Some secondary
school”? Break this down by gender. Here you need to do a t-test rather than just compare the mean values, a possible code is:
ttest_subset<-filter(LSAYwave1, EDUC_M>1 & EDUC_M<4) test_math<-t .test(TOT_MATH˜EDUC_M, data=ttest_subset)
I’m sure there are other ways to do this and the code will need to be adjusted for each question.
(g) Do students born overseas in English speaking countries outper-
form students born overseas in non-English speaking countries? (h) Do students born in 1981 outperform students born in 1980? Do
students born in 1980 outperform students born in 1979?
(i) How many schools are covered in each state?
(j) What is the average number of native and migrant students in the
schools of each state, respectively?
(k) What is the average share of male students in the schools of each
state?
(l) Which state has the highest average reading score?
(m) Which state’s share of students with reading score at 20 is the highest?
(n) Do you find that male students who have higher reading scores also have higher math scores? Do you find that female students who have higher reading scores also have higher math scores? Do
you find any gender differences here? Use empirical results to support your answer here.
(o) Do you find that students from schools located in Metro areas have
higher mathematics and/or reading scores than other schools?
2. Continue with LSAYpanel.dta. This is a data set in wide format. That is, each respondent’s responses in each wave of the survey are recorded as separate variables.
(a) Merge this dataset with the one you used in Question 1 using R’s
merge command. Type ?merge in the consol to see how to do
this. I needed to merge by stuidno.
(b) Use the R pivot longer command to put the data in long form.
Long form will have all the years for each respondent in rows
followed by all the years of the next respondent. (Depending how
you merged you may have gotten duplicate variables that R then
appended with .x or .y, that’s ok, just remove the .y’s.)
LSAY_long <- pivot_longer(LSAY_merge,
cols = !c(stuidno, state, size, schno, schtyp, schclass, sex, indig,
dob, TOT_MATH, TOT_READ, COB_S3, COB_ARR, disab, EDUC_M,
EDUC_F, wt, dob_date, mob, yob, high_read, score_diff, female, metro),
names_to = c(" .value", "year"),
names_sep = -4
)
attach(LSAY_long)
LSAY_long <- LSAY_long[order(stuidno, year),]
(c) Relabel the variables so that you know what they mean (you need to install the “expss” package, there are others but I’m going with this for now):
# variable labels
LSAY_long = apply_labels(LSAY_long,
XHSL = "highest school level completed",
XCSL = "current school level",
XCEL = "current qualification level",
XFTS = "full-time or part-time study status",
XHEL = "highest qualification level completed", XBAC = "study status in bachelor degree or higher", XVET = "study status in VET",
XLFS = "labour force status",
XHRS = "average weekly hours worked",
XFTP = "full-time or part-time employment status", XEMP = "permanent or casual employment",
XMOB = "job mobility during last year",
XATR = "status in apprenticeship/traineeship", XWKP = "average weekly pay",
XHRP = "average hourly pay",
XOCC = "occupation",
XCHI = "number of dependent children",
XATH = "living with parent(s)",
XOWN = "living in own home",
XMAR = "marital status",
XUNE = "any spell of unemployment during the year", XFTE = "in full-time employment or full-time education"
)
(d) You may also want to rename the variables so that their names are not capitalized or to remove the x (you will need to do some of these for when you append a third data set in Q3):
LSAY_long<-dplyr::rename(LSAY_long, pid = stuidno) LSAY_long<-dplyr::rename(LSAY_long, schoolid = schno) LSAY_long<-dplyr::rename(LSAY_long, hrs = XHRS) LSAY_long$dob_date = as .Date(format(as .Date(LSAY_long$dob, "%m/%d/%y"), "19%y-%m-%d"))
LSAY_long$yob = as .numeric(format(LSAY_long$dob_date,"%y")) etc .
You may need to come back to this command and make a few other changes before you can append properly in Q3.
(e) In order to interpret the results from the questions below more
clearly generate a female dummy = 1 if the respondent is a woman and 0 if the respondent is male.
(f) What is the proportion of the sample whose highest school level completed is less than year 12 in each year from 1998 to 2006?
Why is this changing over time?
(g) What is the proportion of the female sample whose highest school
level completed is less than year 12 in each year from 1998 to 2006?
(h) What is the proportion of the sample whose highest qualification level completed is a Bachelor degree in each year from 1998 to 2006?
(i) What is the proportion of the male sample whose highest qualifi- cation level completed is a Bachelor degree in each year from 1998 to 2006?
(j) What is the proportion of the sample whose labour force status is employed in each year from 1998 to 2006?
(k) What is the proportion of those living with a parent whose labour force status is employed in each year from 1998 to 2006?
(l) Explore the gender pay gap, regress average hourly earnings on whether or not the respondent is female and working. Restrict each regression to only one year for now. Try a few different spec- ifications including variables that you think might be important. Which of your specifications is your preferred specification? Why? (This is an open ended question, explore a few options)
3. Comparing two cohorts
(a) We want to append the LSAY2009.dta file to our data set from Q2. Install the plyr package. R will complain that the program clashes with dplyr but for this one thing it will be fine. The two data sets are not quite compatible, some variables have different names across the two data sets, you may need to go back to your code in Question 2 part d and rename some of the variables to match the names in the 2009 data set.
It will also be helpful if in the Q2 dataset you generate a variable called cohort and set it equal to 1995.
(b) We will also need to adjust some variables from Q2 so that we are measuring the same things as the variables in Q3. For exampe the unemployment measures across the two datasets are different:
LSAY_long$employ <- ifelse(LSAY_long$XLFS==1, 1, 0) LSAY_long$unemp <- ifelse(LSAY_long$XLFS==2, 1, 0) LSAY_long$nlf <- ifelse(LSAY_long$XLFS==3, 1, 0)
(c) You can now append the two data sets:
LSAY2009 <- read_dta("Data/LSAY2009 .dta") lSAY_append<-rbind .fill(LSAY_long, LSAY2009)
(d) You are ready to run regressions that determine the likelihood of being employed and to compare the likelihoods by cohort. Regress
employment status on age, age squared, cohort and gender. (e) Redo the regression but add the school type.
(f) Redo the regression include the state a respondent is in.
(g) Based on your regression results, would you consider the 2009 cohort to be less attached to the labour market? Explain. (You may want to run additional regressions to test your ideas)
(h) Based on your answer to the previous question, propose an inter- vention where you would like to increase the labour force attach- ment of the 2009 cohort. Discuss the treatment you would use and how you would measure the causal effect of the treatment.
You are done! Now you know that most of what you do in data analysis is data cleaning.