闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Applications of Econometrics

Assessed Group Project

Spring 2022

The role of unions in labour markets is a long-standing research topic in labour economics. It is often found, for example, that unionised workers get paid higher wages, and some authors argue that

unionisation could lead to lower inequality. Unions have declined in importance somewhat in the second

half of the 20th century, but there is currently a resurgence in unionisation in some countries, arguably as a response to increasing inequality.

In this project we try to estimate several eﬀects of unions for the U.S. using the Survey of Income and Program Participation (SIPP). This is a household panel dataset with detailed information for a sample of U.S. households. It is representative for the U.S. population and has been used in many applied research projects. See the section ’Getting Started’ below on how you can obtain the data and prepare it for analysis. We think it makes sense to limit your sample to prime-age workers (age 25-50) for the entirety of the project.

All parts of the project carry equal weight. Groups have to submit a word/pdf ﬁle that has answers to the questions below along with a doﬁle that has all the commands in it that the group used.

● Both the word/pdf document and the doﬁle have to be submitted before the deadline. Projects submitted without a doﬁle will incur the default penalty of a late submission (5 marks).

● Answers to questions should be limited to 3 pages per question (1-2 pages is probably enough).

● The doﬁle should be written in such a way that anyone with access to the raw data ﬁles can replicate the analysis.

● Stata outputs (tables/ﬁgures) have to be included in the document. It is not enough to refer to outputs in the Stata log/doﬁle.

● Before submission groups have to declare that the project is their own work. There is no separate form to complete, it can be done directly on Learn.

● Make sure that you are aware of the requirements for appropriate citation of references and data sources. Read the guidance on plagiarism in Section 4.4.1 of the Economics Honours Handbook and/or the general University guidance. If you include anything from another source it must be properly acknowledged, whether it’s a ﬁgure/table or a text passage or anything else.

● You are welcome to ask questions on piazza or come to helpdesks. We will try to help as much as possible with data preparation and are of course happy to clarify where things are unclear. Topical questions we will typically not answer to be fair to all students.

Time Series Questions

For this part you have to aggregate the SIPP data to a monthly time series (see ’Getting Started’ below).

1. Calculate averages by month and year of unionisation, log real wages and unemployment. Plot the time series for these over time. Make sure you label the axes correctly. Then run a simple regression of log real wages on unionisation rates, and unemployment on unionisation rates. The former could capture whether unionisation rates positively or negatively aﬀect average wages, the latter could capture whether unionisation rates increase/decrease unemployment. Interpret your ﬁndings.

2. Is there evidence for seasonality and trends in unionisation rates, log real wages, and unemployment rates? What about serial correlation or heteroskedasticity? Investigate these issues and try to make your results in (1.) robust to those concerns.

3. The regressions we ran in (1.) and (2.) might suﬀer from omitted variable bias. This could be due to dynamics (e.g. lags) that you haven’t already dealt with in (2.) or important other factors not captured by our variables. Explain why this might be the case and propose solutions. Run regressions including controls you think are sensible and interpret your results. The variables you include don’t have to be from the SIPP, e.g. you could try to include quarterly log GDP. l Hint: Don’t include too many, we have a limited number of observations here. I’d say 10 is the absolute max. We’re not looking for a perfect speciﬁcation here, that’s almost impossible. We’re looking for two or three speciﬁc concerns and how you could deal with those. You don’t have to try to solve all the potential problems.

Panel Questions

For this part you have to use the panel dimension of SIPP. A panel entity in this dataset is a person and time is measured in months.

4. Provide some descriptive statistics for your variables, such as the mean, minimum, and maximum of key variables (unionisation, wages, age, education etc). Make sure you provide clear indications of what you are reporting. This means do not include the raw variable names in the table. Instead, use a descriptive label like ’hourly wage in $’. Hints: Report shares instead of means for categorical variables (e.g. education). It probably makes sense to report descriptive statistics for the sample you are using in the following questions (Q5-Q7), e.g. workers currently earning a positive wage. It might also make sense to check how representative your sample is for the population. In this question formatting is especially important so make sure your tables/ﬁgures are clearly labelled and self explanatory.

5. Estimate the union wage premium by pooled OLS. We usually do this by regressing log real wages on an indicator for whether the worker belongs to a union. Include your own choice of control variables. Some suggestions: education, time trends, whether there are children in the household, marital status, age, and race.

6. Estimate the union wage premium using random eﬀects and ﬁxed eﬀects and compare your estimates to the POLS results (include the same controls as far as possible). Make sure your results are robust to heteroskedasticity and serial correlation. Explain why the numbers are diﬀerent, which estimates we trust most, and discuss your ﬁndings.2

7. Now we look at some heterogeneity. Using the ﬁxed eﬀects estimator (and again your set of controls), estimate whether the union wage premium is diﬀerent for e.g. women and men. You can be creative here. It would be interesting to compare the premium in diﬀerent industries, for example. Discuss your ﬁndings.

l E.g. available at https://fred.stlouisfed.org/.

2 When discussing your ﬁndings a ﬂavor of theoretical reasoning is a plus here. I.e. why do economists think there’s a

link between wages and being in a union? See e.g. ’Labor Economics’ by George Borjas, McGraw Hill 2020, or many online

resources e.g. https://economics.mit.edu/files/4689 is advanced but very good.

Getting Started

In this section we provide basic instructions how to download the dataset and make it ready for analysis. The extracts below are from data-prep.do, which is available on Learn. We will update this section if

many students struggle with something (we also might have overlooked something of course). You are very welcome to come to the helpdesks or ask on piazza if you have problems.

You can ﬁnd all the raw datasets at https://www.census.gov/programs-surveys/sipp/data/ datasets.html. Since these datasets can be very big we uploaded ﬁles on Learn that exclude some probably unnecessary variables and only include prime-age workers (age 25-50). You can use those and/or download additional ﬁles from the U.S. Census Bureau directly. Each ﬁle (wave) contains 12 months of a year, so we have the same person roughly 12 times per wave. Here we show you how to merge the datasets we put on Learn together. That gives you 6 years (72 months) of data. You are free to extend the data further back but be warned that this is not easy because the structure of the survey changed.

/*==============================================================================

PREPARATION OF SIPP DATA, by AofE Teaching Crew

Description: Append waves of SIPP data and select variables.

Download data from Learn or https://www.census.gov/programs-surveys/sipp.html

==============================================================================*/

* Type in path/folder where the dataset is located global datapath "."

* Open the file containing 2020 wave 1 data (covering January-December 2019)

* We already pre-selected prime-age workers (keep if tage >= 25 & tage <= 50)

* and dropped unnecessary variables

use $datapath/pu2020_prime, clear

* Keep variables (you won’t need all of them, this is just a selection of potentially useful ones)

* NOTE: If you want to add variables from the full dataset just enter them here

keep eplaydif eddelay tjb1_mwkhrs tjb1_msum esex ems erp spanel ssuid erace tage eeduc rmesr /// edisabl efree_lunch edaycare tutils tosavval pnum tjb1_occ tjb1_ind ejb1_scrnr eafnow /// monthcode tpearn tmwkhrs rwksperm tage rmwkwjb twkhrs1-twkhrs5 *_union *_cntrc

gen refyear = 2019

lab var refyear "Calendar year"

lab var monthcode "Calendar month"

* Note that ’refyear’ and ’monthcode’ are crucial variables for the analysis as

* they capture the time period all of the other variables refer to

*=======================================

* APPEND ADDITIONAL YEARS *

*=======================================

* Append data from 2019 wave 1 (covering January-December 2018)

append using "$datapath/pu2019_prime", keep(eplaydif eddelay tjb1_mwkhrs tjb1_msum esex ems /// erp spanel ssuid erace tage eeduc rmesr edisabl efree_lunch edaycare tutils /// tosavval pnum tjb1_occ tjb1_ind ejb1_scrnr eafnow monthcode tpearn tmwkhrs rwksperm ///

tage rmwkwjb twkhrs1-twkhrs5 *_union *_cntrc)

replace refyear = 2018 if missing(refyear)

* Append data from 2018 wave 1 (covering January-December 2017)

append using "$datapath/pu2018_prime", keep(eplaydif eddelay tjb1_mwkhrs tjb1_msum esex ems /// erp spanel ssuid erace tage eeduc rmesr edisabl efree_lunch edaycare tutils /// tosavval pnum tjb1_occ tjb1_ind ejb1_scrnr eafnow monthcode tpearn tmwkhrs rwksperm ///

tage rmwkwjb twkhrs1-twkhrs5 *_union *_cntrc)

replace refyear = 2017 if missing(refyear)

* Append data from 2014 wave 4

append using "$datapath/pu2014w4_v13_prime", keep(eplaydif eddelay tjb1_mwkhrs tjb1_msum esex ems ///

erp spanel ssuid erace tage eeduc rmesr edisabl efree_lunch edaycare tutils /// tosavval pnum tjb1_occ tjb1_ind ejb1_scrnr eafnow monthcode tpearn tmwkhrs rwksperm /// tage rmwkwjb twkhrs1-twkhrs5 *_union *_cntrc)

replace refyear = 2016 if missing(refyear)

* Append data from 2014 wave 3

append using "$datapath/pu2014w3_v13_prime", keep(eplaydif eddelay tjb1_mwkhrs tjb1_msum esex ems ///

replace refyear = 2015 if missing(refyear)

* Append data from 2014 wave 2

append using "$datapath/pu2014w2_v13_prime", keep(eplaydif eddelay tjb1_mwkhrs tjb1_msum esex ems ///

replace refyear = 2014 if missing(refyear)

tab monthcode refyear // Tabulates number of observations per reference year and month

* Save data

compress // often saves memory by making the dataset smaller (see help compress)

save $datapath/SIPPdata, replace

If you want to add additional variables a helpful command is lookfor. This searches through the

labels to ﬁnd a search term. For example, you could ﬁnd all variables that have ’children’ in the label by

using lookfor children.

Once you have something like our “SIPPdata.dta” that contains monthly data for respondents you can start preparing variables. Here we show you how we might get hourly wages, unionisation status, and unemployment status. We also convert nominal to real wages using the CPI (also available on Learn).

* Hourly earnings (wage)

g wage = tpearn / (tmwkhrs*4*rmwkwjb/rwksperm)

* It’s a survey so we get some weird values, e.g. negative wages

* We’ll just ’fix’ those by setting them to zero replace wage = 0 if wage < 0

* Similarly, some wages will be unrealistically high. Topcode those at 95% su wage, d

replace wage = r(p95) if wage > r(p95) & !missing(wage)

* Merge in the CPI to get real wages

merge m:1 refyear monthcode using $datapath/cpi, keep(match) nogen

* Get the real wage (in 2018 prices); you can use ’rwage’ as a measure for real

* wages for all questions in the project g rwage = wage / cpi

* Create monthly indicator for union status egen temp = rowmin(ejb*_union)

g union = temp == 1

* Create monthly indicator for unemployment

g unemployed = rmesr == 5 | rmesr == 6 | rmesr == 7

label var wage "Nominal wage in $"

label var rwage "Real wage in 2018$"

label var union "Member of a union this month"

label var unemployed "Unemployed this month"

To answer the time series questions you will need to aggregate the individual-level survey data and calculate monthly averages. We show you one way to do this here.

* collapse the dataset to averages by month and year

collapse (mean) union rwage unemployed, by(monthcode refyear)

g monthly_date = ym(refyear,monthcode)

tsset monthly_date

format %tm monthly_date

To work with the panel data we need to create a unique person id that lets Stata know what the panel unit is. You could do this as follows.

egen id = group(ssuid pnum)

g monthly_date = ym(refyear,monthcode)

xtset id monthly_date

Finally, adding additional variables or determining what the codes correspond to can be a bit tricky.

We show you an example for how to generate a dummy for ’married’ here. First we need to ﬁnd any variable that has ’married’ in the label:

lookfor married

> storage display value

>variable name type format label variable label >---------------------------------------------------------------------------

byte %12.0g Is ... currently married, ...

tab ems

> Is ... |

> currently |

> married, |

> widowed, |

> divorced, |

> separated, |

> or never |

> married? |

>------------+-----------------------------------

1 |

2 |

3 |

4 |

5 |

6 |

>------------+-----------------------------------

> Total | 928,704 100.00

Then we need to ﬁnd out what ’1’, ’2’ etc correpond to. You can ﬁnd this in the ’Metadata’ pdf ﬁle that is available on the Census Bureau SIPP homepage. Here’s the entry for ’ems’: