Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Final Project, Coding Supplement Guidelines

ECO250Y0, Professor Khazra and Professor Farhoodi

This document summarises the final project guidelines that we talked about in class.

Your Final project should include all your previous projects. You will have four main sections for your project (and some subsections that you title yourself): "Project One," "Project Two," and "Final Project." You can add titles by adding "#" sign before your title in your markdown cell; check this link for more de-tails on titling: https://www.datacamp.com/community/tutorials/markdown-in-jupyter-notebook

Formatting

Projects lose 15 points if submitted project does not meet ANY of the following re-quirements:

1. Submitted projects must be in PDF. Any other format, including a .ipynb file, is not accepted.

2. The projects should have clear "section titles." Besides the subsections you choose to have, you should have three main sections; Project One, Project Two, and Project Three.

3. Clear title.

4. You should write your project in Jupyter Notebook (Python) and submit it in pdf format. If you have problems converting your notebook to pdf, first download it as HTML, then print/save the HTML version in pdf.

5. This is an individual project. However, you are encouraged to check the projects on the Kaggle.com website that use similar data. We have provided some "useful links" on the data list on Quercus. You can use these sources, but the coding and explanations must be yours. Do not copy and paste the same chunk of code in your project.

Please note that if you include a graph or a table, you should explain what you learn from it. Do not add an output without any explanation. All projects should have an introduction and a conclusion. Suppose you want to send your project to a company or school that you are applying to. The final product should be a clean and comprehensive report.

• Do not include unnecessary chunks of codes or outputs and errors.

• Any graph, summary, or output should have an explanation following. Why do you include it in your report, and what do you understand from it?

Part One: Introduction

(10 points) As always, you should update your intro and conclusion according to your last findings. Your introduction should be similar to an academic paper and of similar quality. Your paper should be publishable.

In your introduction, cite the relevant literature, how your paper adds to the existing literature, and how it differs. The expected length for your introduction is a minimum of one and a half pages.

Papers with no citations and no literature review lose 5-10 marks.

Part Two: Previous Projects

• (20 points) Incorporate the comments that you have received in your previous projects. Finish the parts that you missed in your previous projects, even if you did not lose marks for them. This is the first part of your final project, which will be graded again. We will check the first section of your projects (your updated previous project) and grade it again. If you have not received any comments, try to improve your previous report by doing a better job on the introduction or literature review, fine-tuning your visualizations, or adding more meaningful analysis.

You should significantly improve the quality of your previous projects compared to your initial submissions. Simply printing the previous projects will not be acceptable and such work will not receive the points for this part.

Final Project Details

You are almost there. We built upon the past projects towards a complete academic paper on a hot topic using real-world data and cool Python techniques, just excellent! Here’s what we are going to do for your final project.

IMPORTANT: Think hard and answer ALL of the questions below in economic terms. Show your code and results, and provide economic explanations for your results for all parts and connect them to your re-search question and main message. We will evaluate your work’s quality and accuracy; simply answering all questions does not guarantee a full mark.

Part Three

In this part, I ask you to add information to your dataset by merging it with a new data.

• Look for the existing datasets that can complement your data. Data on local characteristics (income, age, population, etc.) should work well with your project. For instance, household-level American Community Survey https://www.census. gov/data.html and kaggle.com are good data sources (You should create an account first.)

• (20 points) Merge the new dataset to your data and provide visualization for the new data and explain how it is related to your project (use maps, trends, histogram, etc.).

Part Four: OLS Regression

(30 points)

1. In your first project, you determined your dependent variable and your research question. Do you think the economic relationship between your Y and X is linear or non-linear? Provide economic intuition using economic theories and facts from your data to back up your answer.

2. Choose your Xs based on the existing theories. Why do you think that your Xs should be in your regressions, and why do you think they can explain your Y?

3. Run four separate regressions and compare your estimates. These four regressions can include different Xs, be of various forms (linear, non-linear), include different years or subgroups of observations, etc.

4. Justify why you chose to run these regressions. What is your economic reasoning for running each of them?

5. Choose your preferred specification and explain why you chose it. In the last section, you will be comparing your results from this preferred regression with your ML results.

6. How do you evaluate your regressions? What measures should you use to assess the performance of your regressions? Talk about each of these measures and interpret them.

7. What do you understand from your regression results? Briefly explain how these results help you answer your research question.

Extra 20 bonus points for causal analysis such as but not limited to difference-in-difference and IV regressions.

Machine Learning

(15 points)

1. This part is harder and may take more time than expected. So, make sure you have done a good job in the previous parts first. I do not want you to spend too much time on this part before having a good introduction and OLS part.

2. Re-write the objective function for a regression tree with your variables and clearly explain the objective function in your own words.

3. Explain the regularization parameters and talk about how changing them can affect your model and results.

4. Run the regression tree using the Xs you can justify including in the model.

5. Output your tree and explain your results.

6. Talk about your error of prediction.

7. (5 out of 15 points for this part) Compare your results from running a regression with your results from running a regression tree. What extra information have you extracted using the regression tree that was not possible to extract with a linear regression? Your explanations should tackle both the econometrics of the OLS and regression trees and the economic intuition behind both models.

Part Five: Conclusion and Future Work

(5 points) As always, you should update your conclusion according to your last findings. Your conclusion should be similar to an academic paper’s conclusion and of equal quality. You should include your findings and summarize your paper (at least one page). Once more, your paper should be publishable.

Explain how you want to improve your paper in the future. Think about whether you need to add more data or try different types of regressions. Talk about the limitations of your work and the next steps to improve these limitations. This part should be backed by evidence and be in scientific writing. Vague and general writing is not acceptable.

Upload your Jupyter Notebook (your code and explanation) in pdf format on Crowdmark for marking. Also, please push your final Jupyter to Git.