关键词 > ECOM30001/ECOM90001
ECOM30001/ECOM90001: Basic Econometrics Semester 1, 2022 Solutions: Tutorial 2
发布时间:2022-07-16
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
ECOM30001/ECOM90001: Basic Econometrics
Semester 1, 2022
SoLUTIoNs: TUToRIAL 2
Introduction
This tutorial reviews some basic operations using the econometrics software package R that we will be using in this subject. Specifically, this tutorial reviews:
- running an OLS regression in R
- plotting actual data values and fitted values
- running an OLS regression on a sub-sample of the data in R
- some simple data transformations (natural logarithms)
- calculation of marginal effects
This tutorial requires one data file:
- houseprices 2017 .csv
This file can be obtained from the Canvas subject page.
In addition the R script file tut2 .R provides the program code necessary to complete the tutorial. This R script file uses the following packages which need to be installed prior to running the R script file:
ggplot2 : for creating graphs and plots in R
stargazer : for easily generating summary statistics for an R data file scales : for displaying thousands with commas in graphs in R
These can be installed directly in RSTUDIo from the packages tab or by using the com- mand install .packages() and inserting the name of the package in the brackets.
Please feel free to play around with the code, particularly the plotting commands for ggplot2 and the table commands for stargazer
Question
Download the data file tut2 .csv, from the Canvas page.
This file contains data contains on the selling prices of houses in metropolitan Melbourne during the 2016 calendar year. There are several variables of interest:
price = Selling price, dollars
distance = Distance from the C.B.D.,in kilometres
bld area = Dwelling size, metres squared
landsize = Land size, metres squared
Note that:
large =
a) Consider the following econometric model:
pricei = β0 + β1 bld areai + ei (1)
What is the interpretation of the parameter β0 ? What is the interpretation of the parameter β1 ?
Solution: The parameter β0 represents the mean selling price for a house with a building area of zero. It would represent the value of the land only. However, if the data do not include any properties with zero building area this parameter will not be estimated precisely (out of sample prediction). The parameter β1 represents the marginal effect of an additional square metre of building area on the mean selling price.
b) Estimate this model in R and provide a brief description of the point estimates. Produce a scatter plot of both price and the fitted values against bld area. Com- ment on how well the estimated model fits the data.
Solution: Figure 1 provides the OLS estimation results. The estimated coefficient for β0 is $685.8360 which implies that the average value of land alone is $685, 836. The estimated coefficient for β1 implies that the average selling price increases by $2, 674.30 for each additional square metre of building area. Recall that the dependent variable is measured in thousands of dollars.
The scatter plot is presented in Figure 2. The estimated model appears to ade- quately fit the data for properties with relatively smaller building areas. However, it tends to considerably ‘under-predict’ selling prices for properties with relatively smaller building areas that sell for relatively large prices. Additionally, it also tends to ‘over-predict’ selling prices for some properties with relatively larger building ar- eas.
4,000
3,000
Actual Data
Fitted Values: Linear Model
2,000
1,000
250 500 750
Building Area, square metres
Figure 2: Actual and Fitted Values for price: Part (b)
Ultimately, there are likely several other factors, beyond just building area, that determine selling prices. These factors have been collected in the random error of the econometric model (1). Some important variables might be the distance from the C.B.D, the age of the dwelling, the characteristics of the dwelling (such as the number of bedrooms, number of bathrooms etc . . . ), quality of local schools, and proximity to local amenities. Moreover, at least some of these omitted factors are also likely related to building area. We will be studying omitted variables and how they affect the estimated parameters of econometric models later in this subject.
c) Consider the following econometric model:
pricei = β0 + β1 bld areai + β2 bld areai(2) + εi (2)
What is the marginal (or partial) effect of an additional square metre of dwelling
size (bld area) on the selling price?
Estimate this equation in R. What is the estimated marginal effect of an additional
square metre of dwelling size for a home with 300 square metres of building area?
Hint: You will need to generate a new variable representing the squared value of the variable bld area.
Produce a scatter plot of price against bld area. On the same graph, produce a line plot of the fitted values for the linear model (from part b) and the quadratic
Figure 4: Actual and Fitted Values: Part (c)
Solution: The marginal effect is given by:
= β1 + 2 β2 bld area
?bld area
The estimation results are presented in Figure 3. The estimated marginal effect is:
?p一rice
In this quadratic model, an additional square metre of dwelling space for a house with 300 square metres of dwelling area is estimated to increase the sales price by $2, 495. Compare this to the estimated effect in the linear model in part (b) of $2, 674 (which restricts the marginal effect to be same regardless of the dwelling area). Note that for properties with 300 square metres of dwelling area, the esti- mated marginal effect for model (2) is remarkably close to the estimated marginal effect for model (1). This is also confirmed through an examination of Figure 4 which indicates that the fitted lines for model (1) and model (2) are quite close to each other, at a building area of 300 square metres.
Note that the estimate of b2 = _0.006542. This implies that the estimated relation- ship between selling prices and building area is an ‘inverted u-shape’ . For houses with sufficiently large dwelling areas, an additional square metre of dwelling area is estimated to Teduce the selling price. We will be looking issues associated with the appropriate functional form in econometric models, including quadratic functions, in a few weeks.
Aside: Is this likely a ‘causal’ effect? Is it likely that for houses with sufficiently large dwelling areas, an additional square metre of dwelling area is estimated to Teduce the selling price? In our simple model, it is likely that this is not a ‘causal’ effect. Why?
(a) Outliers: There are only a few observations for houses with a large building
area and relatively low selling prices. It is feasible that these observations are not really representative of the population of houses sold in Melbourne.
(b) Omitted Variables: The econometric model (2) only relates selling prices
to the dwelling area. There are likely omitted variables, that are related to the dwelling area, that also affect the selling prices. Effectively, the estimated negative relationship between dwelling area and price for large dwellings, really reflects the effects of these omitted characteristics. For example, houses with a larger dwelling area will generally be located in different areas to houses with a smaller dwelling area and these location characteristics might be important determinants of prices. For example, houses with a larger dwelling area tend
to located further from the C.B.D and it is this characteristic that is associated with lower prices.
We will be exploring these issues throughout the subject.
The actual and the fitted values for the quadratic model and the linear model (part b) are presented in Figure 4. It appears that the quadratic model fits the data slightly better—it is slightly better at capturing the lower selling prices for houses with a larger building area. However, it still tends to ‘under-predict’ selling prices for properties with relatively smaller living areas that sell for relatively large prices. The RSS for the linear model (1) is 709, 209, 196 while for the quadratic model (2) it is 677, 774, 520. At first glance, the minimised value of the sum of squared residuals appears lower for the quadratic model so it is tempting to conclude that the quadratic model fits the data better. This is also confirmed by looking at the R2 reported in the estimation output. For the linear model in part b), the R2 in Figure 3 is 0.1220 while for the quadratic model the R2 reported in Figure 3 is 0.1609. However, since the quadratic model includes an additional explanatory variable (compared) to the linear model, the RSS must necessarily be lower (and the R2 higher) for this model.
d) Estimate the econometric model (2), restricting the sample to houses that are on large lots. Now repeat the estimation for houses not on large lots. Comment on how the estimations differ.
Hint: You will need to restrict the samples using the variable large.
Solution: The estimation results are presented in Figure 5. The estimated marginal effect of an additional square metre of dwelling area for a house with 300 square metres of dwelling area is $3, 345.37 for large lots and $1, 854.77 for smaller lots.
? bl(?)d are(p一rice)a │large=1 = b1 + 2 b2 bld area = $3.345367
and:
? bl(?)d are(p一rice)a │large=0 = b1 + 2 b2 bld area = $1.854769
arginaleffectof an
additional square metre of dwelling area on selling price is greater for properties on larger lots. This possibly reflects a preference for yard space. For smaller lots, an additional square metre of dwelling size substantially reduces the available yard space. For larger lots there is not as large a reduction in yard space so buyers are prepared to pay more for the same square metre increase.
e) Consider the following econometric models:
pricei = β0 + β1 distancei + εi Model I
and:
lnpricei = β0 + β1 distancei + εi Model II
where lnprice represents the natural logarithm of the variable price.
Estimate Model I in R. Produce a scatter plot of price against distance and a line plot of the fitted values from Model I against age.
Now generate a new variable lnprice, as the natural logarithm of the selling price price.
Estimate Model II in R. Produce a scatter plot of lnprice against distance and a line plot of the fitted values from Model II against distance.
Compare the scatter plots for each model (Model I and Model II). Which estimated model do you think fits the data better? Why?
4,000
3,000
Actual Data
2,000 Fitted Values: Linear Model
1,000
0
0 10 20 30 40
Distance from CBD, in kms
Figure 7: Actual and Fitted Values: Part (e)
0 10 20 30 40 Distance from CBD, in kms |
Figure 8: Actual and Fitted Values, Log Selling Price: Part e)
Solution: The estimation results for Model I are presented in Figure 6. The mean selling price declines by $36, 347.10 for each additional kilometre from the C.B.D. The estimated coefficient for the intercept implies that the average price of land alone in the C.B.D. (with a zero distance) $1, 636, 508.
The estimation results for Model II are also presented in Figure 6. The mean selling prices declines by 3.37% for each additional kilometre from the C.B.D. The estimated coefficient for the intercept implies that the average price of land alone in the C.B.D. (with a zero distance) is:
1000 * {exp(b0 )} = 1000 * {exp(7.3795)} = $1, 602, 788
Later in this subject, we will be studying how to interpret estimates in econometric models involving different functional forms, such as natural logarithms.
The fitted values for Model I are presented in Figure 7 and the fitted values for Model II are presented in Figure 8. Comparing the two plots, Model II which uses the natural log of the selling prices appears to fit the data better. The under- prediction of selling prices, relative to the actual data, appears less of an issue in Model II. As noted in Question 1(e) in Tutorial 1, taking logs reduces the scale in which a variable is measured.
Note: Since the dependent variable in Model I is different to the dependent variable in Model II, it is not possible to use the R2 for these two models to make any judgments about which model is better in terms of goodness of fit.