Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Term Project and Homework Assignments

Returns to Education

ECON 4400

1 Overview

Human Capital, defined as the skills, knowledge, and abilities that an individual possesses, has been a focal point of economic research in labor, development, and political economy—to name a few.  Education and job training are important human capital investments, leading to higher earnings and non-pecuniary benefits. Your ECON 4400 project focuses on the former. You will quantify the returns to human capital, estimating the effect of a year of schooling on an individual’s wage.  While economics has developed sound theoret- ical foundations, empirical work on the return to human capital has been at the center of considerable debate.

As part of your project, you will explore a part of that debate by replicating (approximately, I have simplified the analysis to a degree) the results of Angrist and Krueger (1990) using the 2021 American Community Survey (ACS). I chose this approach to foster critical thinking and deepen econometric knowledge.  Our analysis will also draw upon Bound, Jaeger, and Baker’s (1995) critique of the instrumental variables ap- proach used in Angrist and Kreuger (1990).

Throughout the term, you will complete parts of the analysis and submit each component as a homework assignment. In doing so, I can assist with your learning of econometrics in practice. Additionally, the home- work assignments enable me to address issues with coding or analysis.

For each assignment, you need only to submit what is requested.  You will include the tables created for each assignment with your term project.  A homework assignment will also ask you to introduce, discuss, and explain particular sections of your term project, e.g., data, regression analysis, results, and econometric methodologies. After I return the assignment, you should edit and expand the section following the outline below, addressing any notes or needed corrections.

You will analyze the returns to education and labor force participation at the state level.  Refer to Table 1 to see your assigned state. To download your data file, log on to Carmen, go to Modules, scroll toward the bottom of the page, and download the state data file assigned to you.

2 Paper Requirements and Expectations

You will write a three to four page analysis (not including tables and can be longer if needed) of the returns to education (and of labor force participation) and submit it at the beginning of class on Tuesday, 12/05. The paper will include three tables: a table of summary statistics, labor force participation estimates, and returns to education estimates (see Sections 3.1, 3.2, 3.3, and 3.4 ). You need to attach your do-file with the paper. If you do not submit a working do-file, you will receive, at most, half credit for this assessment.


Your do-file needs to be cleaned of any redundant or incorrect commands.  The entire do-file needs to be executable. In other words, if you click the execute icon, Stata executes every command without error.

Your write-up of the analysis should follow the below general outline–the sub-items do not need to follow the stated order. At a minimum, you must address each enumerated item. Your writing needs to flow (does not read as an itemized list). Each paragraph must consist of one key idea and includes supportive statements (evidence, results, etc.) of that key idea. Additionally, you need to ensure your writing includes transitions between key ideas (paragraphs).

1.  Introduction

(a)  Discuss the importance and benefit of education in the context of earnings.  For background, read the following papers:

•  “Economic returns to education:  What We Know, What We Don’t Know, and Where We Are Going–Some Brief Pointers” by Dickson and Harmon (2011)

•  “Does Compulsory  School Attendance Affect Schooling and Earnings” by Angrist and Krueger (1990)

•  “Educational Attainment and Quarter of Birth: A Cautionary Tale of LATE” by Barua and Lang (2008)

•  “Problems With Instrumental Variables Estimation When the Correlation Between the In- struments and the Endogenous Explanatory Variable is Weak” by Bound, Jaeger, and Baker (1995)

You can access the papers on Carmen Modules, Articles for Term Project–bottom of the Modules page

2.  Data and Methodology

(a)  Cite and discuss the data used for the analysis

(b)  Discuss the sub-samples used for the analysis, referencing the summary statistics

3.  Labor Force Participation

(a)  State the objective of using regression analysis to explain labor force participation (b)  Include the labor force participation model (see Section 3.2)

(c)  Discuss the OLS and Logistic results

4.  Returns to Education

(a)  Introduce and discuss the wage equation (see Section 3.3)

(b)  Discuss OLS return to education

(c)  Discuss why the OLS estimate for the return to education is biased

(d)  Discuss Two Stage Least Squares (2SLS) estimator–how does it address the endogeneity prob- lem?

(e)  Discuss the instrumental variables (see Section 3.4), including the relevancy and validity re- quirements

(f)  Discuss the 2SLS return to education

(g)  Compare and discuss OLS versus 2SLS estimates.  Do the result meet expectations?  Explain (Hint: why is OLS biased?)  Discuss the F-statistic from the test for weak instruments.  What insights does the test provide regarding the results?

5.  Discussion and Conclusion

2.1 Paper Formatting

Font: 11pt Times New Roman font

Margins: One-inch margins (top, bottom, left, and right)

Line spacing: 1.5 lines

Start of new paragraph: Indent (no additional spacing between paragraphs)

Text Alignment: justified

Make sure to include your first and last name on the paper

References and Citations - Chicago Style If you choose to support an argument by drawing on the work of other scholars, you need to follow the below citation and reference style (Chicago).  When you cite an article or research paper, you must include a reference section with your paper.

Citation and reference examples:

In-text citation

Reference list

Author Year

First authors last name, first authors first name, second authors first and last names, third authors first and last name, . . . , and last authors first and last name. Year of publications.  “Title of article.” Title of Journal, volume number(issue/number, or date/month of publication if volume and issue are absent): page numbers (if any).

Example - Parenthetical

(Tesseur 2022)

Tesseur, W. 2022. Translation as inclusion? An analysis of international NGOs’ translation policy documents.” Language Problems and Language Planning, 45(3): 261–283.

Example - Narrative

Piketty and Saez (2003)

Piketty, Thomas, and Emmanuel Saez.  2003.  “Income Inequality in the United States, 1913–1998.” The Quarterly Journal of Economics, 118(1): 1–41.

2.2 Stata Do-File

You will generate one do-file for this project.  Each assignment will have you add to your code document (do-file).  You must save your do-file at each step of the project (I recommend saving it regularly when working on an assignment). Separate each part using asterisks. For example:

********************


**ECON  4400  Project:    Name  -  Assigned  State

********************

********************

**Homework  1  -  Summary  Statistics

. . .code  here . . .

********************

********************

**Homework  2  -  Labor  Force  Participation

. . .code  here . . .

********************

********************

**Homework  3  -  OLS  Returns  to  Education

. . .code  here . . .

********************

********************

**Homework  4  -  2SLS  Returns  to  Education

. . .code  here . . .

********************

2.3 Data Assignments

Table 1: Data Assignments for Term Project (and Homework Assignments)


3 Homework: Putting Together Your Analysis

3.1 Homework 1, Due Tuesday, 09/26

Overview of assignment and what you will submit:  Generate  a table reporting summary statistics of various samples.  Write one to two paragraphs summarizing, characterizing, and noting the similarities or dissimilarities between the samples. You will submit a paper copy of your write-up with the summary statis- tics table and a print-out of your do-file at the beginning of class on Tuesday, 09/26.

What to Submit - three items:

1.  A write-up discussing the data source, the samples, and summary statistics

2.  A table of summary statistics

3.  Attach a printout of your do-file (the entire document)

We will generate three subsamples for our analysis. The first sample consists of all individuals between the ages of 19 and 65 who are not on active duty. The second sample consists of individuals in the labor force who reported a wage or salary in 2020 (the 2021 ACS reports income from the prior year). The third sample includes only individuals between 29 and 40 years old who reported a wage or salary and participated in the labor force. We will use the latter sample to estimate the returns to education.

Your first homework assignment will require you to complete a process known as data cleaning. Researchers often need to recode or generate new variables from survey data.  The below commands will walk you through how to “clean ACS data” to estimate the returns to education and the probability that an individual participates in the labor force.

The task of data cleaning is often an arduous one.  To cultivate skills in command-based coding and data analytics using Stata, I provide code enabling us to use the ACS data for regression analysis.  There is one exception (see below), where I ask you to generate a dummy variable indicating whether a person is em- ployed. All other variable recoding or generation processes are provided in this section.

In Stata to indicate a range, e.g., tabulate incwage between 20,000 and 40,000, i.e., 20, 000 ≤ incwage ≤ 40, 000, the code is tab  incwage  if  incwage>=20000  &  incwage<=40000. Suppose you want a “or” statement, use | . For example, you want a count of respondents who are married: count if marst==1 | marst==2, where a value of one indicates a married person and two indicates married but separated (for assigned values and designations regarding marital status: label list marst_lbl). The vertical line | denotes or” and & denotes “and” in Stata.

It is best practice to describe (label) newly generated variables. It will describe the variable enabling you to determine what it represents or measures when referring back to it. I am leaving variable labeling to you. It is not something you need to do, but it may be helpful later in the term.

label  var  variable_name  "Description "

To begin,upload your assigned data into Stata (Note: If you copy the Stata code from this PDF, some characters may not correctly reproduce onto the do-file. If you receive an error message after execut- ing your do-file, check whether the source is due to an incorrectly copied character.):

use  path/acs_2021_X.dta,  clear

where path denotes the directory path where the data file is saved on your computer.  The “X” is a place holder for the State FIP code, e.g., if assigned California, the State FIP code is 6.

Define sample:  To estimate the returns to education and labor force participation, we need to define the appropriate subsamples for analysis.

Keep all observations between the ages of 19 and 65.

keep  if  age>=19  &  age<=65

Generating variables for analysis:

•  Generate a dummy variable indicating whether a respondent reports participating in the labor force gen  lf=(labforce==2)

•  Generate a dummy variable indicating whether a respondent reports being enrolled in school gen  attending=(school==2)

•  Generate a set of dummy variables indicating which quarter of the year they were born, e.g., 1st, 2nd, 3rd, or 4th.  The below command will produce four dummy variables labeled qtr1, qtr2, qtr3, and

qtr4.

tab  birthqtr,  gen(qtr)

•  Generate a variable byear indicating a respondent’s birth year.  The variable will be used to generate dummy variables for birth year, capturing variation in wages by birth cohort (see Homework 2).

gen  byear=year-age

•  Generate a variable for the square of age gen  age2=age^2

•  Generate a dummy variable indicating if a respondent is married gen  married=(marst==1  |  marst==2)

•  Generate an interaction term between the variable married and the number of children under the age of five in the household (nchlt5)

gen  marchlt5=married* nchlt5

•  Generate a dummy variable if respondent identified as male gen  male=(sex==1)

Generate dummy variables for race and ethnicity.

Generate a dummy variable if respondent identified race as White non-Hispanic gen  white=(race==1  &  hispan==0)

Generate a dummy variable if respondent identified race as Black gen  black=(race==2)

Generate a dummy variable if respondent identified race as Asain or Pacific Islander gen  asian=(race>=4  &  race<=6)

Generate a dummy variable if respondent identified as Hispanic gen  hispan2=(hispan>=1  &  hispan<=4)

Generate a dummy variable indicating whether a respondent self-identifies other than White gen  drace=(race!=1)

•  Generate a dummy variable indicating whether a respondent works in a Metropolitan Statistical Area (MSA)

gen  msa=(pwtype==1   |   pwtype==2   |   pwtype==3   |   pwtype==4   |   pwtype==5)

•  You try: Generate a dummy variable indicating whether a respondent reports being employed.  You will create a variable labeled employed using the ACS variable empstat. To do so, type label  list empstat_lbl on the Results Window command line.  Stata will display labels and corresponding values associated with each employment category. Using that information, you will generate a binary variable that takes on the value of one if employed and zero otherwise.

•  Generate a new variable for years of schooling.  When using the ACS, researchers need to recode education attainment to properly reflect a respondent’s years of schooling.  To see why, in the Stata command line, type label  list  educd_lbl.  We will name the new educational attainment variable grade, denoting highest completed schooling.  The code for generating a variable reflecting years of schooling is

gen  grade= .

replace  grade=0  if  educd>=0  &  educd<=12

replace

grade=1

if  educd==14

replace

grade=2

if  educd==15

replace

grade=3

if  educd==16

replace

grade=4

if  educd==17

replace

grade=5

if  educd==22

replace

grade=6

if  educd==23

replace

grade=7

if  educd==25

replace

grade=8

if  educd==26

replace

grade=9

if  educd==30

replace

grade=10  if

educd==40

replace

grade=11  if

educd==50   | educd==61

replace

grade=12  if

educd==60   | educd==62   | educd==63 | educd==64

replace

grade=12.5

if  educd==65

replace

grade=13  if

educd==70

replace

grade=13.5

if  educd==71

replace

grade=14  if

educd>=80 & educd<=83

replace

grade=15.5

if  educd==100

replace

grade=16  if

educd==101

replace

grade=18  if

educd==114

replace

grade=19  if

educd==115

replace  grade=20  if  educd==116

•  Generate dummy variables for reported occupation using two-digit SOC classifications.  To “clean” the ACS variable indicating occupation (occsoc) requires advanced coding skills. I am providing the  code below–copy it into your do-file to generate the occupational dummy variables.  Make sure the  code that you copied into your do-file has all the same characters.  If not, you may need to edit the  copied content in your do-file.

gen  occupation=o