ECON4003 Econometrics 1: Introduction to Econometrics


Coursework Briefing 2022/23


Econometrics 1: Introduction to Econometrics


Topic: Simple Linear Regression Model

Further details

This assignment consists of one question with two parts.  Part I involves calculations while Part II requires the use of Stata.  See questions in the next page.

Students should:

-    explain all steps in their analysis and their findings

-    not copy word-for-word from the course material without demonstrating own understanding

-    use an equation editor to type equations, e.g. MS Word Equations; hand-written answers are not acceptable

-    use the graphics produced by Stata

-    present the regression results in a table format similar to Empirical Exercise 3.1 Solutions part (e)

-    include a word count at the end of assignment

The word limit is a maximum, not an expectation. Equations, numbers in tables and Stata commands do not count.

Section A

A researcher is interested in the relationship between two variables, X and Z and their effect on Y . They collect a random sample of individuals and estimate the following model:

Yi = β 1Xi + β2Zi + ϵi

where Yi  is the outcome of individual i, Xi  is individual i’s first characteristic, Zi  is individual i’s second characteristic and ϵi  is the error term.  The researcher suspects that Y may cause Z in such a way that: Zi = γYi + νi .

Suppose that: νi|Y N (0,σν(2))

νi  and ϵi  are statistically independent.

(a) Show that the error term ϵi = ( ) Zi β1Xi .

(b) Express E(ϵi|Z) in terms of Zi,Xi,νi  and the model parameters. Explain all calculation steps.1        (c) With reference to relevant OLS assumption(s), explain if the OLS estimators ( 1 , 2 ) are unbiased.

Section B

Download the dataset weight.dta from Moodle.  The dataset contains the following variables with information on a random sample of 17,870 individuals:

❼  identifier: individual identifier number

❼  sex: 1=Male, 0=Female

❼  weight: weight without shoes (in pounds)

❼  height: height without shoes (in inches)

Use the dataset to answer parts (d) to (g). Include your Stata commands in an Appendix. Consider the following regression model:

weight = δ0 + δ1 heighti + νi

where weighti  is the weight of individual i in pounds, heighti  is individual’s i height in inches, and νi is the error term.

(d) Explain if you deem, it is appropriate to interpret the OLS estimator 1  as the causal effect of height on the weight of individuals, using relevant OLS assumption(s) and your intuition.

(e) Estimate the regression model with both homoscedastic and heteroscedastic-robust standard errors. Present the estimation results in a table. Interpret the estimated intercept, slope coefficient, and R2 , with reference to this particular regression model.  Can you make any conclusion regarding the errors of the model?

(f) Compute the OLS residuals from the regression in part (e) and plot them against height.  Explain if any OLS assumption(s) appear(s) violated.

(g) Estimate the regression model again for (i) women, and (ii) men.  Present the estimation results as separate columns in a table. Interpret and compare the slope coefficients with your answer in part (e).

Coursework Rubric

A holistic rubric provides a list of assessment criteria together with broad description of the characteristics that would be expected for each level of performance.



Very Good





Demonstrate clear                   knowledge and application   of statistical concepts. Calculations are correct, comprehensive to solve the  problem, and easy to follow.

Demonstrate clear                   knowledge and application   of statistical concepts. Calculations are presented    in a logical manner. A few    calculation steps are missing or incorrect.

Demonstrate some               knowledge and application of statistical concepts. Calculations are difficult to follow at times. Some         errors are found.

Demonstrate limited            knowledge and application of statistical concepts. Limited understanding of   the problem is evidenced. Difficult to follow the           calculation steps.

Do not demonstrate            knowledge and application of statistical concepts. No understanding of the           problem is evidenced. Unable to follow the            calculation steps.


Analysis and


Demonstrate excellent          understanding of the core    concepts of simple linear      regression. Explanations are correct. Conclusions are       appropriate based on            relevant theoretical or           quantitative analysis.

Demonstrate very good     understanding of the core concepts of simple linear   regression. Analysis and    explanations are generally correct with a few               exceptions.

Demonstrate general             understandings of the core  concepts of simple linear      regression. Analysis and       explanations are sometimes inadequate, unclear and/or incorrect.

Demonstrate limited               understandings of the core   concepts of simple linear       regression. Analysis and        explanations are inadequate and often unclear and/or      incorrect.

Do not demonstrate             understandings of the core concepts of simple linear    regression. Analysis and      explanations are                    inadequate, mostly unclear and incorrect.

Estimation, Presentation and Interpretation of Results

Estimation procedure and results are all correct. Presentation of estimation results is clear and reader  friendly. Diagrams and /or estimation results are         interpreted accurately.

Estimation procedure and   results are correct. Presentation of estimation results is clear. Interpretation of diagrams  and/or estimation results    are generally correctly with a few exceptions.

Estimation procedure and results are mostly correct. Presentation of estimation results is sometimes            unclear. Some mistakes in interpreting the diagrams  and/or results.

Some estimation procedure and results are correct. Presentation of estimation  results is often unclear. Many mistakes in                   interpreting the diagrams    and/or results.

Estimation procedure and    results are erroneous. Presentation of estimation   results is unclear. Interpretation the diagrams and/or results are incorrect.


Concepts and solutions are communicated with clarity, fluency and is virtually         error-free. Terminology is   prevalent and used               correctly.

Concepts and solutions

clearly with a few

exceptions. Most

terminology is used


Communication of concepts and solutions are                     sometimes unclear. Some     terminology is used but with mistakes.

Communication of concepts and solutions are mostly       unclear. Little terminology  is used but with mistakes.

Communication of concepts and solutions is                        problematic. No                     terminology is used                properly.

Feedback method

Individual feedback will normally be provided via Moodle.  Generic (class-level) feedback and grade profiles will normally be posted on Moodle.

Students can use academic staff office hours for additional feedback on your work.

Preparing your coursework

Document creation

1.    Please use this file naming convention:  StudentID_CourseCode_QuestionNo.  e.g. 7299019_ECON4003_1.  If there is no question choice, use 1 as the default.