Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ECN 723: Applied Research Methods, Fall 2022

Short Course Description

This course is structured around the completion of an independent applied econometrics project; detailed instructions are included on the following pages.

It is important to emphasize that the purpose of this course is to give you guidance in completing your project, not to teach you basic econometrics or how to use R (that was the purpose of ECN 627: Economet- rics I, a pre-requisite to this course).

Success with this project will require three main ingredients:  (i) a solid understanding of basic statisti- cal theory, (ii) some rudimentary coding skills, and (iii) the ability to write clearly and concisely in English. This is essentially a course in applied econometrics.

However, one cannot do applied econometrics without having a solid understanding of basic statistical theory as well as some elementary coding skills. Thus, the goal of the course is to ensure that you can actually use what you already “learned” in your previous courses.

This will be accomplished by having you produce an empirically-oriented research project of your own.

Course Materials

I will be posting a series of short videos reviewing the background material you are expected to understand from your previous coursework. Chapters 1-7 of the following text (used in ECN 627) may also help to serve as a review of this material:

Stock, J.H. and M.W. Watson (2011). Introduction to Econometrics, 3rd edition. Pearson.

As noted above, it is not the purpose of this course to teach you things that were covered in your previous coursework; the videos and text mentioned above are intended only for your own review.

Course Delivery


The project will be completed through 4 “instalments”, each worth 25% of your final grade.These instal- ments will be due by 8am on October 14, November 4, November 25, and December 16 (these are all Fridays).

In case something goes wrong  (e.g., you get sick, you have computer problems, etc.), you may submit any instalment up to 72 hours late without penalty, but submissions more than 72 hours late will not be accepted whatsoever.

Please understand that each instalment will involve an enormous amount of work; if you think you can get started on an instalment a few days before it is due, you are setting yourself up for failure.

For each instalment, there is a set of minimum tasks you are expected to complete, but there is noth- ing to stop you from working ahead; indeed, if you achieve all of the tasks for the final instalment with one of your earlier instalments, I will give you full marks on that and all remaining instalments).

Academic Integrity

It is absolutely essential that you familiarize yourself with Toronto Metropolitan’s policies on academic integrity (see https://www.torontomu.ca/academicintegrity/ for more information). Please understand that these policies fully apply to computer code. The good news is that it’s very easy to avoid any issues: all you need to do is refrain from discussing your project with other students!

Academic Accommodation Support

If you have a diagnosed disability that impacts your academic experience, please contact Academic Accom- modation Support (https://www.torontomu.ca/accommodations/).  Requests for accommodation must be made at the beginning of the semester, i.e., please do not ask for an  “extension” immediately before (or anytime after) a deadline.

Departmental Policies

Please see the student handbook:  http://economics.ryerson.ca/files/handbook.pdf (LINK NEEDS TO BE UPDATED FOR TMU)

Project Instructions

For this project, you will be required to use data from one of the following articles:

Ater, Itai, Yehonatan Givati, and Oren Rigbi (2017). “The Economics of Rights: Does the Right to Counsel Increase Crime?. American Economic Journal:  Economic Policy 9(2) 1 – 27. [Data]

Bagues, Mario, Mauro Sylos-Labini, and Natalia Zinovyeva (2017). “Does the Gender Composition of Scientific Committees Matter?”. American Economic Review  107(4) 1207 – 1238. [Data]

Baskaran, Thushyanthan, and Zohal Hessami (2018). “Does the Election of a Female Leader Clear the Way for More Women in Politics?”. American Economic Journal:  Economic Policy  10(3) 95 – 121. [Data]

Dahl, Gordon B., and Lance Lochner (2012). The Impact of Family Income on Child Achievement: Evi- dence from the Earned Income Tax Credit.”. American Economic Review  102(5) 1927 – 1956. [Data]

Cellini, Stephanie Riegg, and Claudia Goldin (2014). Does Federal Student Aid Raise Tuition? New Evi- dence on For-Profit Colleges.”. American Economic Journal:  Economic Policy 6(4) 174 – 206. [Data]

You can work in pairs for this project, and you are encouraged to choose your own partners.  Please email me with the name of the paper you wish to work on along with the name of your partner. You may work alone if you prefer, but must let me know, either way.  You must let me know (via email) which of these articles you are interested in no later than 8am on September 21; if I do not hear from you by then, I will make the choice for you.

You are not being asked to replicate the article that you are getting your data from. Instead, once you let me know which of these articles you are interested in, I will suggest a slight variation on it for you to do (e.g., if the original article analyzed performance for all students, I might suggest that you focus only on the performance of boys).

Ultimately, your aim in this project is to answer a causal  (not “casual”) question such as the following: Does being placed into a small class cause students to perform better academically?  To do so, you will be required to use a regression model of the following form:

Outcomei  = α + βTreatmenti + XiV + Ui ,

where Outcomei  is the outcome (e.g., a test score) for the ith individual, Treatmenti  is equal to 1 if the ith individual receives the treatment (e.g., being placed into a small class) and 0 otherwise, Xi  is a vector of control variables (e.g., age, gender, etc.) for the ith individual, and Ui  is an idiosyncratic error term.

The main parameter of interest is β, which is known as the “average treatment effect” or ATE (no one really cares about α or V).  Thus, the null hypothesis you will want to test is H0  : β = 0.  If any of this is unclear to you, please make sure to spend some time watching my videos that review the background material you are expected to be familiar with from your previous coursework.

Submission Instructions

The project will be completed through 4  “instalments”each worth 25% of your final grade.  You should think of these instalments not as 4 separate pieces of work, but rather 4 versions of the same piece of work, each one being“better” than the one that came before it. That is, each instalment should not only add new features, but also improve the existing  features (this means fixing any technical errors you had previously, making your writing more clear, etc.).

Instalment submissions must be made via a private Google Drive folder that I will share with you once you let me know which article you are interested in. Specifically, each instalment will require you to upload files named paper-x .pdf and code-x .txt, where x is the instalment number (e.g., your first instalment will require files named paper-1 .pdf and code-1 .txt).  The file named paper-x .pdf is to be a PDF file containing the latest version of the write-up for your project.  The file named code-x .txt is to be a plain text file containing the latest version of your R code. For the first instalment, you will also need to upload your data file (do not modify this file in any way, i.e., do not rename it or convert it to a different format). All of these files must be contained entirely within the Google Drive folder I share with you; please do not create any subfolders within this folder or place anything inside a “zip file”!  If you have done everything correctly, you will have uploaded exactly 9 files in this folder by the time you are done this project (2 for each instalment plus 1 containing your data).

Please note that failure to precisely follow the above submission guidelines will result in a mark of zero. For example, if you were to upload your write-up as a Microsoft  Word file rather than a PDF file, or your R code as a rich text file rather than a plain text file, you would get a zero.1

Your R Code

The single most important thing to keep in mind about this project is that I need to be able to replicate all of your results  .  This means you need to submit code that actually works for me (and it is not my job to “debug” it for you in order to make it work for me).  Accordingly, your R code needs to be “universal” in the sense that it is written in plain text and will run on any computer  .2

To run your code, I will set my working directory to the Google Drive folder I have shared with you and enter the following command in the R console:

source("code-x .txt")

(where x is the instalment number).3  Your code needs to be written so that the above produces every single number that appears in your write-up as output (you will need to use the print() function to print out the results you want me to see).  If this doesnt work for any reason, you will get a mark of zero.  You should work in exactly the same fashion yourself rather than interactively” (i.e., typing commands directly into the R console).  Indeed, before submitting any instalment, you should re-start the R console and run the above command to make sure you get the results you are expecting (if it doesn’t work for you, it obviously won’t work for me).4

For the purpose of this project, you are not permitted to use any R packages except for haven (for reading data saved in Stata format) and sandwich (for computing HC standard errors).

Below are some general guidelines for your R code.  If you fail to follow of any of these guidelines, you will get a mark of zero.

• Do not include any line beginning with > (i.e., lines that you copied from the R console).  Such lines are not valid R code.

Do use the following as your very first line to ensure R’s memory is cleared: rm(list=ls())

• Do not include any calls to the install .packages() function or the remove .packages() function. However, do make sure to include a call to the library() function for the packages you use. I would strongly recommend that any such calls come immediately after the first line described in the point above, i.e., the first 3 lines of your code should be




You would then read in your data and so on.

• Do not include any calls to the setwd() function.

• Do not include any “path”references when reading in your data. That is, you should have something like read dta("data .dta") rather than read dta("/Users/JaneDoe/ECN723/data .dta"), and just manually set the working directory in R to the location where you’ve saved your data file (remember: when I run your code, I will set my working directory to the Google Drive folder I have shared with you, i.e., the folder containing your data file).

• Do not include any calls to functions that open a graphical interface such as the View() function (you can use this yourself if you would like, but it will just create an error for me).

• Do not create separate data frames for your treated and non-treated groups. You should have a single data frame containing all of your observations, and within this data frame, there should be a treatment variable equal to 1 for observations in the treated group and 0 for observations in the non-treated group.

• Use the attach() function exactly once (make sure to do so only after you have cleaned”your data).

• Do not “hard code” any numbers. For example, even if you knew that the OLS estimate of β in your basic model (see below) is 0.383648, the number 0.383648 itself should never appear anywhere in your code. Instead, you could use something like the coef() function to assign this number to the variable betahat .basic (or whatever you want to call it). In other words, you are doing something very wrong if you need to run your code in order to find a result which that you plug back into your code as input.

Do make sure to have your code print out some description of the numbers it produces. For example, rather than just



you should do something like the following:

print("Number  of  observations  in  the  treated  group")


print("Number  of  observations  in  the  non-treated  group")


This ensures that someone running your code sees more than just a bunch of numbers.

Your Write-up

You will need to create a short write-up describing precisely what you have done/found. It must be no more than 10 pages in length, but all else equal, shorter is better (clearly explain everything you are doing in detail, but keep it concise).5

Your write-up should be written so that it would be easy for another student in this course to read it and  understand exactly what you have done/found. That is, your “target audience” consists of readers who know  something about economics and econometrics, but don’t necessarily know anything about the specihc topic you are writing about (do not assume that your readers have read the article that you obtained your data  from).  This means that you can skip explaining straightforward things like how to calculate a T-statistic and put all of your energy into explaining the design of your experiment, what all of your variables measure,  and what your results tell you about your causal question of interest.  Please do not try to sound smart” by using “big words” and overly formal language: Just write clearly and concisely in plain English  .

Your write-up must be split into 3 sections:

1. Introduction

This section should very clearly explain what your causal question of interest is, and how your experi- ment is designed. Make sure to explain exactly what your treatment”is. Again, do not assume that your readers have read the article that you obtained your data from; it’s your job to briefly summarize the experiment here. You should also explain how what you are doing is different (e.g., you might only

be using a subset of the data used in the original article).

This section should be 1 to 2 pages in length.

2. Data and Model

This section should provide a very clear explanation of the model you are estimating and how all the different variables in it are defined.  Be  very  specific.  For example, if your outcome variable is “TestScore” you need to explain exactly what this is measuring, i.e., what kind of test it is, when the test took place, what the score is out of, etc. Information about your outcome, treatment, and control variables should be summarized in a table (call it Table 1; see ecn723-project-sample .pdf for an example).

You should also include a table here providing the sample mean (and its standard error) of all of these variables for the entire sample and also for each group (treated and non-treated) separately; this table should also clearly list the total number of observations as well as the number of observations in each group (call it Table 2; see ecn723-project-sample .pdf for an example).

After describing your variables and providing sample statistics, you will need to specify your regression model in a formal equation as is done on p. 2 above (of course, you will need to use your own variable names, e.g.,  “TestScore” rather than “Outcome”).  In addition to your “full” regression model that includes all of your variables, you will also be required to estimate a “basic”version of it that does not include your control variables, i.e., a model of the following form:

Outcomei  = α + βTreatmenti + Ui

Rather than writing out equations for both models, however, just write out the equation for your full model and then explain in words that your basic model is identical but excludes the control variable (i.e., your write-up should include exactly 1 equation of the form given on p. 2 above).6

You do not need to go into any of the details about your econometric methods,  but you should clearly state what methods you are using. For example, you might tell us that you are estimating the parameters in your model using OLS and that you are providing us with HC standard error for them. Finally, make sure to clearly describe exactly what hypothesis you will be testing (namely H0  : β = 0)

and how this relates back to your causal question of interest.

Overall, this section should be 3 to 4 pages in length.

3. Results

This section should clearly describe your results. You should have a table here showing your average treatment effect estimates (and their standard errors) from your basic and full models (call it Table 3; see ecn723-project-sample .pdf for an example). Remember that no one cares about the estimates of α or V; all that we care about is your estimate of β (the ATE).

Most importantly, you need to formally test the hypothesis you described in Section 2 (do this using the results from both your basic model and your full model, but base your overall conclusion on the full model as it should provide a more accurate estimate of the average treatment effect). Specifically, make sure to report the T-statistic and its corresponding p-value from each of your models. Whatever you do, please do not compare your T-statistics to any critical values (i.e., do not ever write anything like Since |T| > 1.96, we reject H0”).  Instead, focus on interpreting the corresponding p-value as a

measure of strength of evidence against the null.

This section should be 1 to 2 pages in length.

Your write-up does not need a “Conclusions” section or any appendices (remember that I have your R code, so there is no need to include it in your write-up).  You only need the 3 sections (and 3 tables) described above; no more, no less.

In addition to this outline, you must adhere to the following formatting guidelines:

• Use 1 inch margins on all sides, and number each page inside the bottom margin (centered).

• Use“justified” alignment for all paragraphs (i.e., text stretched out from the left margin to the right margin).

• Double-space everything (except footnotes and notes for tables, which should be single-spaced).  Do not include an extra space between paragraphs or between sections. In other words, the space between any two paragraphs should be exactly the same as the space between any two lines within a paragraph. Similarly, the space between the last line of a section and the title of the next section (or between the title of a section and the first line of a section) should be exactly the same as the space between any two lines within a paragraph.

• Use a 12 pt font size for everything (except footnotes and notes for tables, which should be 10 pt).

• Do not include a title page. The first line of text should be your main title (centered and in bold), the second line of text should be your name (centered), and the third line of text should be title of the first section (left-justified and in bold), and so on.

• Do not indent the first line of the first paragraph of a section, but do indent the first line of each subsequent paragraph.

• Use bold for your main title, the number/title of each section, and the title of each table, but nowhere else.

 Use footnotes rather than endnotes.

• Do not paste any R code or output into your write-up.

• Tables should only contain horizontal lines, and these horizontal lines should only be at the top of the table, after the header row, and at the bottom of the table.

Above each table, you must write“Table X: Blah blah blah”(without the quotation marks) where “X” is the table number and “Blah blah blah” is the description.

• Always refer to tables by writing “Table X” (without the quotation marks) where X is the table number (notice that Table is capitalized). For example, you might write“ ... are shown in Table 1”.

• You do not need a  References” section since you are only going to cite one paper (the paper you  found your data from). Instead, include a full reference to this paper in a footnote the first time you mention it, and always refer to it as Lastname1 and Lastname2 (year)” (if there are two authors) or “Lastname1 et al. (year)” (if there are 3 or more authors).  For example, you might write something  like Angrist and Lavy (2009) estimate...”or Banerjee et al. (2015) examine...”.  Do not ever write  first names, article titles, or journal names in the main body of text.

All of these formatting rules are demonstrated in the file named ecn723-project-sample .pdf. Please read it very closely. If you dont follow these formatting guidelines, I will just stop reading and give you a zero.


For each instalment, there are a set of minimum tasks that you need to achieve:

Instalment    Due (8am)       Minimum Tasks                                                                                               1                   October 14       -Create the basic layout of your write-up and ensure you have it formatted


-Write the entirety of Section 1.

-In   Section   2,    give   a   detailed   description   of   what   your   out- come/treatment/control variables are and complete Table 1.

-After getting rid of any observations with NA values (for any of your vari- ables), compute the number of observations you have in the entire sample and in both the treated and non-treated groups (you will know you are on the right path if the total number in these two groups is equal to the number in the entire sample). Fill these values into the bottom row of Table 2.       -Make sure that your data file is uploaded into the Google Drive folder I share with you.

2                   November 4      -Compute your summary statistics and complete Table 2. To check that you

are on the right path, make sure that (a) the sample mean of your treatment variable for the entire sample is equal to the number of observations in the treated group divided by the number of observations in the entire sample, and (b) for every variable, the sample mean for the entire sample lies some- where between the sample mean for the treated group and the sample mean for the non-treated group. If either of these conditions is not satisfied, you have done something severely wrong.

-Specify your regression model and describe the hypothesis you will be test- ing in order to complete Section 2 (this should come after Table 2).

-Read over Section 1 again and spend some time to improve your writing (please don’t think it is already “perfect”; your writing can always be im- proved). Do not neglect this step!

3                   November 25    -Use OLS to estimate your basic and full regression models and fill in the

first column of Table 3. To check that you have done things correctly, use the numbers in Table 2 to compute the two-sample T-statistic for comparing the mean of the outcome variable between the treated and non-treated groups (you can just do this by hand to check for yourself; do not use the t .test() function in R as it is based on some silly assumptions); the numerator and denominator of this test statistic should be equal to the estimated coefficient on your treatment variable and its standard error, respectively (you don’t need to report the value of this test statistic in your write-up; just compute it in R to check that you are on the right path).  If this condition is not satisfied, you have done something severely wrong.

-Use the results from Table 3 to test the null hypothesis that the coefficient on your treatment variable is equal to zero (you will have two separate tests, one using the basic model and one using the full model; you should compute a T-statistic for each and also provide their corresponding p-values). Discuss your findings in Section 3 right below Table 3.

-Read over Sections 1 and 2 again and spend some time to improve your writing. Again, do not neglect this step!

4                   December 16    -Address any issues from your previous instalments.

-Read over your entire write-up again and spend some serious time to im- prove your writing.  Even if your all of your analysis is “correct”, you will get a poor mark if your writing is poor.

code-1 .txt, respectively, and so on.  Nothing is “written in stone”; you can add/remove/modify any part of your code or write-up for any new instalment.  For example, even though you will have written your introduction for the first instalment, you still need to put some effort into improving  that section in each subsequent instalment (i.e., you aren’t done”with the introduction after the first instalment).  Think of this as a process of continuous improvement.

Feedback and One-on-one Meetings

Inside the private Google Drive folder I share with you, there will be a file named feedback .txt that I will use to give you feedback on each instalment (this will be updated within one week of every new submission; I will indicate which instalment I am referring to so that there is no confusion).7  Please make sure to incor- porate all of the feedback I leave into your next instalment. The absolute worst thing you can possibly do in this course is to ignore this feedback. If I start reading a new instalment and see that you have ignored the feedback I gave on your previous instalment, I will just stop reading and give you a zero.

In the week following the week in which I provide feedback, you will have the opportunity to meet with me one-on-one to review that feedback and ask any questions you might have about the next instalment (e.g., the first instalment is due October 7th, so you will be receiving written feedback from me on that during the week of October 10th, and will then have the opportunity to meet with me one-on-one during the week of October 18th). You will also have the opportunity to meet with me one-on-one prior to October 7th in order confirm that you are on the right track with your first instalment (if you are going to meet with me then, I would strongly suggest having the 3 files needed for your first instalment uploaded ahead of time so that I can go through them with you and let you know if there are any major issues; you can always make changes to your code and/or write-up after meeting with me but before the October 7th deadline). Sign-up sheets for all one-on-one meetings will be provided via D2L.