关键词 > ECMT6002/6702

ECMT6002/6702: Assignment


Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ECMT6002/6702: Assignment (20 marks)

Due October 23, 2022

Word Limit Guide : 1800 words max


The objectives of this assignment are two-fold. First, you will be required to apply economet- ric methods from this course to a real problem. This involves describing characteristics of the data, some data manipulation, running estimations, interpretation, and testing. Second, I would like to give you some experience at writing up results in a concise report. Reports should contain a: cover page, introduction, data summary, method, results with discussion, and brief conclusion. The report should be appropriate for a business/work situation. Therefore marks will be given for how the report is presented. Use of Appendices is also encouraged for larger tables of results, tests or non-essential charts.


Your assignment is to analyse the results of children’s maths proficiency over a four year pe- riod. This will require the use of a linear regression model and a probability model. You should include sufficient models in your report to answer the below questions, which are designed to guide the modelling.


The dataset includes three test results for each student from publicly run maths tests; when the student is in primary school ( 8.5yrs), middle school ( 10.5yrs) and the first year of high school ( 12.5yrs). We have a rich set of explanatory variables about the location of the school, school type, languages at home, parents’ job and parents’ educational background. The dataset instructions follows on page 3.


• Linear model

Please answer the following questions within your report.

– Explain the determinants of the first-test result.

– Explain the determinants of the third-test result.

– Explain the determinants of the change in result between the first and third test.

You may use any variables in the dataset known at the time of each test (with the exception of using the second-test result) and are free to transform variables by creating quadratics, dummies, or interaction variables. In the third-test model, you should account for ability via a proxy variable. In the first-test model you do not have this luxury, which means?

You should test your model using your econometric knowledge and make appropriate mod- ifications where appropriate. Your report should discuss interesting results, explore hetero- geneity and explain deficiencies/difficulties/concerns of your models.

• - Probability models

Please answer the following questions within your report.

– Explain the determinants of continued improvement in student’s test results.

– Explain the determinants of continued deterioration in student’s test results.

Here you will need to create dummy variables that meet the continued improvement and continued deterioration criteria. Compare test2 to test1 and compare test3 to test2. [As a hint, you need to specify what constitutes improvement/deterioration, and the final model should control for the start point in test1.] You may transform variables by creating quadrat- ics, other dummies, interaction variables as before.

The model will have similar determinants as in the linear models,  so you should arrive   at a similar final model. The key difference is the interpretation of the probability model’s marginal effects. Either a probit/logit model can be used to answer this question.

As a guide options for continued improvement could be a comparison to a mean (arith- metic or geometric), or a Z-score. This is preferable to comparing the raw percentage, as the test may vary in relative difficulty. For example , let Mi equal your chosen measure for test i, then continued improvement dummy equals a 1 when M3 > M2 and M2 > M1, and 0 otherwise.


The appropriation of marks will be as follows: Analysis and Discussion (15 marks)

• Appropriate use of econometric models

• Correct evaluation of results

• Appropriate testing of econometric model

• Depth of analysis Report mechanics (5 marks)

• Overall presentation of report

• Clear answer to the 5 research questions

• Conciseness

• Logical progression


Instructions and Data for assignment

The source dataset for all students is the same, however, each assignment will be using a different sub-sample of the dataset.

Stata has a function that creates a randomised sub-dataset from a larger dataset. This is repli- cable by setting the random number generator seed. Each assignment will set the seed based on their student id number (sid). For those working in pairs, use the student id of the person who is first alphabetically by last names (then first names if the same).

Please use the assignment do file created and input your sid as the seed.   The master dataset   has 15,000 observations and you will use 10,000. Using the seed will enable markers to replicate your results.


Below is a description of the Stata dataset:

Further information about dataset

The dataset contains numerous dummy variables, that have been created from categorical vari- ables already. It is fine to use them, or create your own dummy variables. Remember including categorical variables directly in regressions is meaningless, and will create an incorrect model. You will be deducted points if you do this!

School starts for all children on the same day. However, due to rules around starting school there are two types of students, which occurs due to the rules around starting school for different birthdays. If child has a birthday between 1-Jan and 31-June, parents can choose whether to start school in January of the year they turn 5 (will be young) or the following year (will be older). If child has birthday between 1-July - 31-December, then child starts school in January of the fol- lowing year, after they turn 5.

When Choice is equal to 1. Parents had the choice to send their child to school early, or wait another year, so the child will be older. When Choice is equal to 0, parents had no choice and sent their child on the first day of school. (no_choice=1)

The variable Choice could be important when determining if Age (or other variables) has any benefit for the first test. If choice=0, then ∂test1 , could be different to when choice=1.

P1_OCC_test1, P1_SCH, P1_NONSCH are categorical variables described respectively below (where P1 stands for parent 1):

Language spoken at home Lang, is categorical and has already been grouped. There is no-need to know what the languages are, but these are likely important due to cultural differences around schoolwork outside of school.

School identifier information:  In this school system changing schools can indicate quality.   It    is reasonable to conclude the following:

• same_school_all - private school

• same_middle_high - private school

• same_school_never - streamed public school

Final Comments

These types of regression models are typical in micro-econometrics. We need to concern ourselves with what’s in our model and what isn’t in our model (confounders). Thinking about potential omitted variables and how they could be correlated with explanatory variables is important. Ad- ditionally, how we deal with omitted variables is often dictated by the dataset we have available. Please be clear in your reports, about your approach taken to deal with any identified problem (which I have partially guided), and describe any weaknesses or concerns. There is no 100% correct answer for this assignment. This assignment provides a mechanism to enhance your understand- ing of applied econometrics by working on a problem and explaining your results.