ECO220Y5Y: Introduction to Data Analysis and Applied

Econometrics

Data Project Two

Winter Term 2021


Is COVID-19 likely to increase educational inequality amongst less de-veloped countries? Does cross-country evidence indicate some countries are at greater risk of an educational crisis?


0.1 Project Overview

Your goal is to describe the variation in COVID-19 related educational inequality and risks facing the education sector across countries. You should comment on the articles which discuss the severity of the situation for many countries and what COVID-19 might mean for educational vulnerabilities. You will do this using the ‘projectdata_Fall2020.xlsx’ file provided. Continuing from your previous analysis you will produce additional evidence using your knowledge from linear regression. You must now include and discuss results using multivariate regression techniques, for example: output tables, interpretation of coefficients, goodness of fit statistics, plots of residuals, etc. You should consider carefully the best specification and provide evidence (formal testing or analysis of plots) to support your model selection. You should consider creating indicator variables that might be used to impact the intercept or as interaction terms. You should look for common violations of the OLS assumptions in your regressions such as heteroskedasticity, serial correlation and non-normality. You should conclude using your model to determine whether COVID has increased educational inequality for developing countries. You should comment on how significant your variables of interest are and what economic significance you can uncover relating COVID to educational outcomes amongst different countries.

    As with data project one, the data are a set of variables downloaded and combined from the Federal Reserve Economic Data, OurWorldinData.org, the OECD data repository and UNICEF. The file comprises cross sectional data for various countries measured on similar dates (ex. data on the severity of COVID-19 and its response are from April 1, 2020). A brief description of the available variables is given in the excel file (further description is available on the corresponding websites). Using suitable quantitative techniques from ECO220 describe some interesting characteristics of the variables of interest to you. In interpreting, explaining and assessing validity of your output, you should read the articles provided. Try to pick out variables that might be related in some way to the question and discuss these. You can also search out your own literature to guide your discussion but be sure to include any other sources in a bibliography.


0.2 Project Submission

As with project 1, project 2 will not be marked based on length but rather how well you addressed the question. Your submission should not exceed 1200 words of text and 4 pages of graphs and tables. If it is written in a clear and concise style, and you have a good handle on generating useful graphs, this limit will be sufficient for a full mark. Write an assessment that is smart, not long. Highlightthe findings that are puzzling, practically useful, thought provoking or seem to be counter-intuitive. Try to deliver a submission that is interesting and easy to follow, a short piece of statistical analysis that you yourself would like to read.

    This Data Project is worth 7.5% of your final mark. All statistical analysis should be done using either Stata or R. The final report should be submitted as a single written document in .pdf format and you must also include your DO file for Stata or SCRIPT file for R. The submission deadline is Monday April 19th.


0.3 Software Help

Several videos on how to use econometrics software are available online. An additional help lecture will be provided. Alternatively there are some good handbooks available for Stata.