Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

EMET8005 Assignment

Instructions   The assignment is due 12 noon on Tuesday 16 May 2023. Your report should be uploaded to Wattle using the Turnitin link provided.  Late submissions will receive a mark of 0 unless an extension has been granted before the deadline, as per the course outline.

Your report must be all your own original work.  Your report should be typed and the file should be in either Word or pdf format.   Part of the assignment is to present results ‘professionally’.  This means that there should be no Stata commands or Stata output in the main text. Extract the information you need from the Stata output, and create nice tables and  figures similar to those you see in textbooks and journal articles. Attach your Stata do and log  files as appendices.  The do file must be annotated with explanatory comments, so that it is clear what results are sought, and it must run without syntax errors (assuming the data file is  in the current working directory).

There is no strict word limit but, everything else equal, a clear and concise writing style may attract higher marks.  We anticipate most reports will be between 600 and 1400 words (excluding tables).

If you have any questions about the assignment, please email [email protected]. There is no penalty for clarification questions.

Student beauty and grades    Before working on this question, read sections of 1–3 of Adrian Mehic’s study ‘Student beauty and grades under in-person and remote teaching’ published in Economics Letters 219 (2022), article number 110782.   (You can find it through the ANU library.)  In this question, we will analyse Mehic’s data. We will take a different approach, so you don’t need to read section 4 in his paper.

Download the file lundbeauty2023 .dta from Wattle. The data are described in Mehic’s article. The dataset has brief variable labels in English, so hopefully you can understand what each variable represents.  The dataset is a panel, where the observational entity is a student and the course code plays the role of time (cf eg a state-year panel dataset).

❼ Note the data are arranged in‘long form’: there is a row for each examination result for

each student.

❼ Mehic analysed standardised log grades.  We are not going to bother here, and we will

just use grades directly as the regressand. We will use the standardised beauty measure however.

❼ Beware there are 15 courses in the program, and one course is missing from Table 1 in

the paper.

❼ Something to keep in mind is that the cohorts starting in each year may have different

(observed and unobserved) characteristics on average (eg ability, ambition, beauty).

❼ Something to keep in mind is that average grades vary across courses; in particular, grades

are lower in the advanced courses 13– 15 compared to courses 1– 12.

❼ Unfortunately, the standard errors will be relatively large, so we will not be able to make

any firm conclusions in this analysis.

Questions follow below.

(a) To get familiar with the data, examine the properties and write a short description of

each variable.  Report the unit of measurement, the mean and standard deviation for variables where these quantities make sense.   For categorical variables, describe how many categories there are and what the distribution is.  Check if there are any missing values in any variables. Include histograms for standardised beauty and for grades.

Note:  When you compute and discuss summary statistics, beware of possible ‘double- counting’. There are multiple exam results per student, multiple students in each cohort, multiple students enrolled in each course, etc.

(b) The dataset concerns students who started in each of the years 2015–2019. So there are

5 student cohorts.  During the first two years of study, these students have to take 15 mandatory courses.  A different instance of each course is offered for each cohort.  We need a word to refer to that, and I am proposing to use‘unit’. So in total there are data for different 75 units (15 courses for 5 cohorts).

We can think of the difference between being taught online vs being taught in person on campus as the treatment vs the control.  To understand the pattern of treated and untreated units, compute the proportion treated for each unit.  Present the proportions for the cohorts and courses in a two-way table. Comment on the pattern (eg which cohort is treated when, how many treated vs untreated units).

(c) To begin, let’s investigate the relationship between beauty and grades in normal times, ie in units that are taught in person and on campus. Since the cohorts are different and the courses are different, we need to allow for different average grades across units, but for simplicity let’s estimate common coefficients for standardised beauty. (It should represent some weighted average of the unit-specific effects.)

We can use subscript i for the student and subscript c for the course. To state the models, we notation for all the category dummies.   Define the 5 cohort dummies cohortji   = 1(cohorti  = j), the 15 course dummies coursekc  = 1(coursec  = k), the 75 unit dummies unitcji  = 1(unitci  = j), and the age dummies by ageki  = 1(agei  = k).  Then a basic short model is

19                                     15

gradeci  = β1 stdbeautyi +工 β2jcohortji +工 β3kcoursekc + β0 + Ui .

j=16                  A long model with more detailed controls is

75

k=2

25

gradeci  = β1 stdbeautyi +工 β2junitcji +工 β3kageki + β0 + Ui .

j=2                             j=19

Estimate these models using the data for the subsample of units not affected by pandemic- related restrictions.  Cluster the standard errors at the student level.  (Check that you have estimated 20 coefficients in the short model and 69 in the long model.)

Present the key estimates and discuss the implications (eg the magnitude and the uncer- tainty of the estimates).  Here and elsewhere, all estimates should be accompanied by a standard error and a confidence interval.

Note: You will need to create the unit variable, unitci , from cohorti and coursec . Anything that assigns a unique code to each cohort-course combination will do. For example, gen int unit=cohort*100+course . Or use egen if you prefer values like 1,2, ....

Note: The grades are clearly not independent across courses for the same student, so at least we should cluster the standard errors at the student level. Grades are probably not independent within courses either, since there is usually a single teacher lecturing and writing the exam, but hopefully we can capture most of that dependence by including course or unit dummies in the regressions.

(d)  Next, write the equations for extended models that allow for both different levels of average grades and different beauty coefficients across the 4 combinations of male/female gender and quantitative/non-quantitative courses, but keep the other controls (cohort, course, unit, age) the same for all 4 combinations.

Estimate the extended models using only untreated units. Present the key estimates in a table and discuss the implications.

Note:   Beware that quantc   is collinear with the set of unitcji   dummies, so Stata will probably omit one of latter dummies to avoid the dummy variable trap. The key estimates should be the same no matter which dummy Stata omits.

(e)  Now on to comparing online vs on-campus courses.  Remember only certain courses for

certain cohorts were online, so we have to think about what is a good comparison group.

Suppose we focus on the cohorts which were treated in some but not all courses. Discuss the merit of comparing grades in online and on-campus courses for the specific cohorts which had both.

Suppose we focus on courses which were taught on-campus for some cohorts and online for other cohorts. Discuss the merit of comparing grades in online and on-campus courses for those specific courses.

(f)  Let’s try a DD approach to investigate how the relationship was affected when teaching

moved online.  Let’s first see if there is any treatment effect on average grades.  Here is a basic DD model

19                                     15

gradeci  = β1 covidci +工 β2jcohortji +工 β3kcoursekc + β0 + Ui .

j=16                                 k=2

Estimate the model using all units (ie the full sample), present the key result, and com- ment.

(g)  Now,  let’s see if the relationship between grades and facial attraction is different in

treated and untreated units. This is an extension of the DD methodology we’ve discussed previously, since we are looking at differences in‘slope’as opposed to differences in‘level’, but the idea is the same. Here is an extended DD model

gradeci  = β1 stdbeautyi + β2 covidci stdbeautyi + β3 covidci

19                                     15

+ β4jcohortji +工 β5kcoursekc + β0 + Ui .

j=16                                 k=2

You can verify (using the ‘plug-in’method) that β1   represents the effect (‘slope’) of stdbeautyi  in untreated units and β2  is the effect for treated units.

Estimate the model (using all units), present the key results, and comment.

(h)  Now, let’s further extend the model in part (g) to see if there are gender differences. (For

simplicity, let’s ignore differences across quantitative and non-quantitative courses.)

Write the DD equation extending the model in part (g) to allow for both the mean level of grades and the coefficient on beauty to vary by both femalei  and covidci , but keep the cohort and course part of the model unchanged. (Probably you should have 26 coefficients.) Estimate the model. Tabulate and discuss the results.

(i)  Explain the conditions that must hold if we are going to interpret the estimated treatment coefficient(s) in part (h) as causal.

Optional: Suggest a graphical way of examining the main condition. Present and discuss  the graph.  (You may be disappointed, the dataset is small and the estimates are very ‘noisy’, so your graph may not yield clear conclusions.)