关键词 > EMET3006/4301/8001

EMET3006/4301/8001 Applied Micro Econometrics Semester 2, 2022

发布时间：2022-09-24

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Applied Micro Econometrics, EMET3006/4301/8001

Semester 2, 2022

Assignment

Due Date Monday October 10 10am

In this assignment you will download data from Wattle. The data are based on a simpliﬁed version of Longitudinal Surveys of Australian Youth (LSAY). The ﬁrst data set is from Wave 1 of the LSAY, a survey of Year 9 students in 1995. The second data set contains subsequent years. The third contains a new cohort, from 2009.

Type your answers in an Rmarkdown document which you upload when you are done.Please also upload your html/pdf document with your output and answers.

You are welcome to discuss the questions with classmates, please do up- load your own work however.

I’ve provided some basic commands but you may want to look some other things up on google.

Questions:

1. Begin with LSAYwave1.dta.

(a) How many year 9 students are surveyed by LSAY in 1995 accord- ing to the data? [The data contain the full set of year 9 students surveyed by LSAY.]

(b) What is the share of male students?

(d) What is the average mathematics score for students born outside Australia? What is the average mathematics score for other stu- dents?

(e) What is the distribution of female students’ month of birth? A posible code is:

LSAYwave1$dob <- format(as .Date(LSAYwave1$dob, format = "%m/%d/%y"),"%Y-%m-%d" )

LSAYwave1 <- LSAYwave1 %>%

mutate(month = month(dob),

yob = yob(dob))

(f) Do students whose mothers’ level of education is “3. all years of secondary” have better mathematics and reading scores than those whose mothers’ level of education is “2. Some secondary

school”? Break this down by gender. Here you need to do a t-test rather than just compare the mean values, a possible code is:

ttest_subset<-filter(LSAYwave1, EDUC_M>1 & EDUC_M<4) test_math<-t .test(TOT_MATH˜EDUC_M, data=ttest_subset)

I’m sure there are other ways to do this and the code will need to be adjusted for each question.

(g) Do students born overseas in English speaking countries outper-

form students born overseas in non-English speaking countries? (h) Do students born in 1981 outperform students born in 1980? Do

students born in 1980 outperform students born in 1979?

(i) How many schools are covered in each state?

(j) What is the average number of native and migrant students in the

schools of each state, respectively?

(k) What is the average share of male students in the schools of each

state?

(l) Which state has the highest average reading score?

(m) Which state’s share of students with reading score at 20 is the highest?

(n) Do you ﬁnd that male students who have higher reading scores also have higher math scores? Do you ﬁnd that female students who have higher reading scores also have higher math scores? Do

you ﬁnd any gender diﬀerences here? Use empirical results to support your answer here.

(o) Do you ﬁnd that students from schools located in Metro areas have

higher mathematics and/or reading scores than other schools?

2. Continue with LSAYpanel.dta. This is a data set in wide format. That is, each respondent’s responses in each wave of the survey are recorded as separate variables.

(a) Merge this dataset with the one you used in Question 1 using R’s

merge command. Type ?merge in the consol to see how to do

this. I needed to merge by stuidno.

(b) Use the R pivot longer command to put the data in long form.

Long form will have all the years for each respondent in rows

followed by all the years of the next respondent. (Depending how

you merged you may have gotten duplicate variables that R then

appended with .x or .y, that’s ok, just remove the .y’s.)

LSAY_long <- pivot_longer(LSAY_merge,

cols = !c(stuidno, state, size, schno, schtyp, schclass, sex, indig,

dob, TOT_MATH, TOT_READ, COB_S3, COB_ARR, disab, EDUC_M,

EDUC_F, wt, dob_date, mob, yob, high_read, score_diff, female, metro),

names_to = c(" .value", "year"),

names_sep = -4

)

attach(LSAY_long)

LSAY_long <- LSAY_long[order(stuidno, year),]

(c) Relabel the variables so that you know what they mean (you need to install the “expss” package, there are others but I’m going with this for now):

# variable labels

LSAY_long = apply_labels(LSAY_long,

XHSL = "highest school level completed",

XCSL = "current school level",

XCEL = "current qualification level",

XFTS = "full-time or part-time study status",

XHEL = "highest qualification level completed", XBAC = "study status in bachelor degree or higher", XVET = "study status in VET",

XLFS = "labour force status",

XHRS = "average weekly hours worked",

XFTP = "full-time or part-time employment status", XEMP = "permanent or casual employment",

XMOB = "job mobility during last year",

XATR = "status in apprenticeship/traineeship", XWKP = "average weekly pay",

XHRP = "average hourly pay",

XOCC = "occupation",

XCHI = "number of dependent children",

XATH = "living with parent(s)",

XOWN = "living in own home",

XMAR = "marital status",

XUNE = "any spell of unemployment during the year", XFTE = "in full-time employment or full-time education"

)

(d) You may also want to rename the variables so that their names are not capitalized or to remove the x (you will need to do some of these for when you append a third data set in Q3):

LSAY_long<-dplyr::rename(LSAY_long, pid = stuidno) LSAY_long<-dplyr::rename(LSAY_long, schoolid = schno) LSAY_long<-dplyr::rename(LSAY_long, hrs = XHRS) LSAY_long$dob_date = as .Date(format(as .Date(LSAY_long$dob, "%m/%d/%y"), "19%y-%m-%d"))

LSAY_long$yob = as .numeric(format(LSAY_long$dob_date,"%y")) etc .

You may need to come back to this command and make a few other changes before you can append properly in Q3.

(e) In order to interpret the results from the questions below more

clearly generate a female dummy = 1 if the respondent is a woman and 0 if the respondent is male.

(f) What is the proportion of the sample whose highest school level completed is less than year 12 in each year from 1998 to 2006?

Why is this changing over time?

(g) What is the proportion of the female sample whose highest school

level completed is less than year 12 in each year from 1998 to 2006?

(h) What is the proportion of the sample whose highest qualiﬁcation level completed is a Bachelor degree in each year from 1998 to 2006?

(i) What is the proportion of the male sample whose highest qualiﬁ- cation level completed is a Bachelor degree in each year from 1998 to 2006?

(j) What is the proportion of the sample whose labour force status is employed in each year from 1998 to 2006?

(k) What is the proportion of those living with a parent whose labour force status is employed in each year from 1998 to 2006?

(l) Explore the gender pay gap, regress average hourly earnings on whether or not the respondent is female and working. Restrict each regression to only one year for now. Try a few diﬀerent spec- iﬁcations including variables that you think might be important. Which of your speciﬁcations is your preferred speciﬁcation? Why? (This is an open ended question, explore a few options)

3. Comparing two cohorts

(a) We want to append the LSAY2009.dta ﬁle to our data set from Q2. Install the plyr package. R will complain that the program clashes with dplyr but for this one thing it will be ﬁne. The two data sets are not quite compatible, some variables have diﬀerent names across the two data sets, you may need to go back to your code in Question 2 part d and rename some of the variables to match the names in the 2009 data set.

It will also be helpful if in the Q2 dataset you generate a variable called cohort and set it equal to 1995.

(b) We will also need to adjust some variables from Q2 so that we are measuring the same things as the variables in Q3. For exampe the unemployment measures across the two datasets are diﬀerent:

LSAY_long$employ <- ifelse(LSAY_long$XLFS==1, 1, 0) LSAY_long$unemp <- ifelse(LSAY_long$XLFS==2, 1, 0) LSAY_long$nlf <- ifelse(LSAY_long$XLFS==3, 1, 0)

LSAY2009 <- read_dta("Data/LSAY2009 .dta") lSAY_append<-rbind .fill(LSAY_long, LSAY2009)

(d) You are ready to run regressions that determine the likelihood of being employed and to compare the likelihoods by cohort. Regress

employment status on age, age squared, cohort and gender. (e) Redo the regression but add the school type.

(f) Redo the regression include the state a respondent is in.

(g) Based on your regression results, would you consider the 2009 cohort to be less attached to the labour market? Explain. (You may want to run additional regressions to test your ideas)

(h) Based on your answer to the previous question, propose an inter- vention where you would like to increase the labour force attach- ment of the 2009 cohort. Discuss the treatment you would use and how you would measure the causal eﬀect of the treatment.

You are done! Now you know that most of what you do in data analysis is data cleaning.