关键词 > ECOM30001/ECOM90001

ECOM30001/ECOM90001: Basic Econometrics Semester 1, 2022 Solutions: Tutorial 2

发布时间：2022-07-16

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ECOM30001/ECOM90001: Basic Econometrics

Semester 1, 2022

SoLUTIoNs: TUToRIAL 2

Introduction

This tutorial reviews some basic operations using the econometrics software package R that we will be using in this subject. Speciﬁcally, this tutorial reviews:

- running an OLS regression in R

- plotting actual data values and ﬁtted values

- running an OLS regression on a sub-sample of the data in R

- some simple data transformations (natural logarithms)

- calculation of marginal eﬀects

This tutorial requires one data ﬁle:

- houseprices 2017 .csv

This ﬁle can be obtained from the Canvas subject page.

In addition the R script ﬁle tut2 .R provides the program code necessary to complete the tutorial. This R script ﬁle uses the following packages which need to be installed prior to running the R script ﬁle:

ggplot2 : for creating graphs and plots in R

stargazer : for easily generating summary statistics for an R data ﬁle scales : for displaying thousands with commas in graphs in R

These can be installed directly in RSTUDIo from the packages tab or by using the com- mand install .packages() and inserting the name of the package in the brackets.

Please feel free to play around with the code, particularly the plotting commands for ggplot2 and the table commands for stargazer

Question

Download the data ﬁle tut2 .csv, from the Canvas page.

This ﬁle contains data contains on the selling prices of houses in metropolitan Melbourne during the 2016 calendar year. There are several variables of interest:

price = Selling price, dollars

distance = Distance from the C.B.D.,in kilometres

bld area = Dwelling size, metres squared

landsize = Land size, metres squared

Note that:

large =

a) Consider the following econometric model:

pricei = β0 + β1 bld areai + ei (1)

What is the interpretation of the parameter β0 ? What is the interpretation of the parameter β1 ?

Solution: The parameter β0 represents the mean selling price for a house with a building area of zero. It would represent the value of the land only. However, if the data do not include any properties with zero building area this parameter will not be estimated precisely (out of sample prediction). The parameter β1 represents the marginal eﬀect of an additional square metre of building area on the mean selling price.

b) Estimate this model in R and provide a brief description of the point estimates. Produce a scatter plot of both price and the ﬁtted values against bld area. Com- ment on how well the estimated model ﬁts the data.

Solution: Figure 1 provides the OLS estimation results. The estimated coeﬃcient for β0 is $685.8360 which implies that the average value of land alone is $685, 836. The estimated coeﬃcient for β1 implies that the average selling price increases by $2, 674.30 for each additional square metre of building area. Recall that the dependent variable is measured in thousands of dollars.

The scatter plot is presented in Figure 2. The estimated model appears to ade- quately ﬁt the data for properties with relatively smaller building areas. However, it tends to considerably ‘under-predict’ selling prices for properties with relatively smaller building areas that sell for relatively large prices. Additionally, it also tends to ‘over-predict’ selling prices for some properties with relatively larger building ar- eas.

4,000

3,000

Actual Data

Fitted Values: Linear Model

2,000

1,000

250 500 750

Building Area, square metres

Figure 2: Actual and Fitted Values for price: Part (b)

Ultimately, there are likely several other factors, beyond just building area, that determine selling prices. These factors have been collected in the random error of the econometric model (1). Some important variables might be the distance from the C.B.D, the age of the dwelling, the characteristics of the dwelling (such as the number of bedrooms, number of bathrooms etc . . . ), quality of local schools, and proximity to local amenities. Moreover, at least some of these omitted factors are also likely related to building area. We will be studying omitted variables and how they aﬀect the estimated parameters of econometric models later in this subject.

c) Consider the following econometric model:

pricei = β0 + β1 bld areai + β2 bld areai(2) + εi (2)

What is the marginal (or partial) eﬀect of an additional square metre of dwelling

size (bld area) on the selling price?

Estimate this equation in R. What is the estimated marginal eﬀect of an additional

square metre of dwelling size for a home with 300 square metres of building area?

Hint: You will need to generate a new variable representing the squared value of the variable bld area.

Produce a scatter plot of price against bld area. On the same graph, produce a line plot of the ﬁtted values for the linear model (from part b) and the quadratic

Figure 4: Actual and Fitted Values: Part (c)

Solution: The marginal eﬀect is given by:

= β1 + 2 β2 bld area

?bld area

The estimation results are presented in Figure 3. The estimated marginal eﬀect is:

?p一rice

In this quadratic model, an additional square metre of dwelling space for a house with 300 square metres of dwelling area is estimated to increase the sales price by $2, 495. Compare this to the estimated eﬀect in the linear model in part (b) of $2, 674 (which restricts the marginal eﬀect to be same regardless of the dwelling area). Note that for properties with 300 square metres of dwelling area, the esti- mated marginal eﬀect for model (2) is remarkably close to the estimated marginal eﬀect for model (1). This is also conﬁrmed through an examination of Figure 4 which indicates that the ﬁtted lines for model (1) and model (2) are quite close to each other, at a building area of 300 square metres.

Note that the estimate of b2 = _0.006542. This implies that the estimated relation- ship between selling prices and building area is an ‘inverted u-shape’ . For houses with suﬃciently large dwelling areas, an additional square metre of dwelling area is estimated to Teduce the selling price. We will be looking issues associated with the appropriate functional form in econometric models, including quadratic functions, in a few weeks.

Aside: Is this likely a ‘causal’ eﬀect? Is it likely that for houses with suﬃciently large dwelling areas, an additional square metre of dwelling area is estimated to Teduce the selling price? In our simple model, it is likely that this is not a ‘causal’ eﬀect. Why?

(a) Outliers: There are only a few observations for houses with a large building

area and relatively low selling prices. It is feasible that these observations are not really representative of the population of houses sold in Melbourne.

(b) Omitted Variables: The econometric model (2) only relates selling prices

to the dwelling area. There are likely omitted variables, that are related to the dwelling area, that also aﬀect the selling prices. Eﬀectively, the estimated negative relationship between dwelling area and price for large dwellings, really reﬂects the eﬀects of these omitted characteristics. For example, houses with a larger dwelling area will generally be located in diﬀerent areas to houses with a smaller dwelling area and these location characteristics might be important determinants of prices. For example, houses with a larger dwelling area tend

to located further from the C.B.D and it is this characteristic that is associated with lower prices.

We will be exploring these issues throughout the subject.

The actual and the ﬁtted values for the quadratic model and the linear model (part b) are presented in Figure 4. It appears that the quadratic model ﬁts the data slightly better—it is slightly better at capturing the lower selling prices for houses with a larger building area. However, it still tends to ‘under-predict’ selling prices for properties with relatively smaller living areas that sell for relatively large prices. The RSS for the linear model (1) is 709, 209, 196 while for the quadratic model (2) it is 677, 774, 520. At ﬁrst glance, the minimised value of the sum of squared residuals appears lower for the quadratic model so it is tempting to conclude that the quadratic model ﬁts the data better. This is also conﬁrmed by looking at the R2 reported in the estimation output. For the linear model in part b), the R2 in Figure 3 is 0.1220 while for the quadratic model the R2 reported in Figure 3 is 0.1609. However, since the quadratic model includes an additional explanatory variable (compared) to the linear model, the RSS must necessarily be lower (and the R2 higher) for this model.

d) Estimate the econometric model (2), restricting the sample to houses that are on large lots. Now repeat the estimation for houses not on large lots. Comment on how the estimations diﬀer.

Hint: You will need to restrict the samples using the variable large.

Solution: The estimation results are presented in Figure 5. The estimated marginal eﬀect of an additional square metre of dwelling area for a house with 300 square metres of dwelling area is $3, 345.37 for large lots and $1, 854.77 for smaller lots.

? bl(?)d are(p一rice)a │large=1 = b1 + 2 b2 bld area = $3.345367

and:

? bl(?)d are(p一rice)a │large=0 = b1 + 2 b2 bld area = $1.854769

arginaleﬀectof an

additional square metre of dwelling area on selling price is greater for properties on larger lots. This possibly reﬂects a preference for yard space. For smaller lots, an additional square metre of dwelling size substantially reduces the available yard space. For larger lots there is not as large a reduction in yard space so buyers are prepared to pay more for the same square metre increase.

e) Consider the following econometric models:

pricei = β0 + β1 distancei + εi Model I

and:

lnpricei = β0 + β1 distancei + εi Model II

where lnprice represents the natural logarithm of the variable price.

Estimate Model I in R. Produce a scatter plot of price against distance and a line plot of the ﬁtted values from Model I against age.

Now generate a new variable lnprice, as the natural logarithm of the selling price price.

Estimate Model II in R. Produce a scatter plot of lnprice against distance and a line plot of the ﬁtted values from Model II against distance.

Compare the scatter plots for each model (Model I and Model II). Which estimated model do you think ﬁts the data better? Why?

4,000

3,000

Actual Data

2,000 Fitted Values: Linear Model

1,000

0 10 20 30 40

Distance from CBD, in kms

Figure 7: Actual and Fitted Values: Part (e)

0 10 20 30 40

Distance from CBD, in kms

Figure 8: Actual and Fitted Values, Log Selling Price: Part e)

Solution: The estimation results for Model I are presented in Figure 6. The mean selling price declines by $36, 347.10 for each additional kilometre from the C.B.D. The estimated coeﬃcient for the intercept implies that the average price of land alone in the C.B.D. (with a zero distance) $1, 636, 508.

The estimation results for Model II are also presented in Figure 6. The mean selling prices declines by 3.37% for each additional kilometre from the C.B.D. The estimated coeﬃcient for the intercept implies that the average price of land alone in the C.B.D. (with a zero distance) is:

1000 * {exp(b0 )} = 1000 * {exp(7.3795)} = $1, 602, 788

Later in this subject, we will be studying how to interpret estimates in econometric models involving diﬀerent functional forms, such as natural logarithms.

The ﬁtted values for Model I are presented in Figure 7 and the ﬁtted values for Model II are presented in Figure 8. Comparing the two plots, Model II which uses the natural log of the selling prices appears to ﬁt the data better. The under- prediction of selling prices, relative to the actual data, appears less of an issue in Model II. As noted in Question 1(e) in Tutorial 1, taking logs reduces the scale in which a variable is measured.

Note: Since the dependent variable in Model I is diﬀerent to the dependent variable in Model II, it is not possible to use the R2 for these two models to make any judgments about which model is better in terms of goodness of ﬁt.