Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STAT2402: Analysis of Observations

Assignment 1 Report

2022

Executive Summary

This report is on the sale prices of houses in Saratoga, New York. Data on 1063 randomly selected houses is available, and includes the sale prices along with property details. A linear statistical model was fitted to the date. The model indicates that the average sale price of a house depends in a complex manner on the floor area, the number of bathrooms, the age and the land size. The average sale price also higher by

$10,659 if the house contains a fireplace.

Introduction

Estimating and predicting house prices is important for buyers, sellers and economists. Typically a linear statistical model can be used for estimating house prices based on the property data [2, 1]. Other models include artificial neural network [5].  The property attributes that contribute to its price differ according     to the country and location.

Kaushal and Shankar [1] fitted a linear model to house price data in India, and built a machine learning platform so buyers could estimate the price of a property of interest. Their model included the number of bedrooms, the number of bathrooms, the landsize and central air conditioning. Md Yusof and Ismail [3] found, based on a linear model that the most influential factor for house prices in Malaysia was locality, followed by building area, land area, distance from city, age of the property and neighbourhood quality.

We will analyse the Saratoga House data to investigate the relationship between selling price and the characteristics of the property. The data contains a random sample of 1063 houses in the Saratoga suburb of New York.

This report is organised as follows. In the next section we describe the statistical methodology used, followed by the Results section. This is followed by a discussion of the findings, which are compared with the literature discussed in the Introduction.

Methodology

We will first explore the data by numerical and graphical summaries. Following this a linear statistical model will be fitted to the data with Price as response. Interaction terms will also be included. The model will be reduced to significant terms only. The final model will be interpreted to explain the dependence of Price on property characteristics.

All statistical analysis will be conducted in the R statistical environment [4].

Results

A summary of the variables in the data set are given in Table 1. We note that four properties have zero

Table 1:  Summary of data.

land size (Acres). These four records otherwise do not appear unusual, so we will leave them in the data. Plots of the data showed that a linear model for Price against the other variables is feasible. Further, while Bedrooms and Bathrooms are count variables, it appears that Price does increase linearly with respect to these.

A linear model was fitted to Price including all second order interaction terms. The model was initially reduced using the stepAIC procedure from the library MASS. Following this only one further interaction term (Age:Acres) was omitted. The model equation including only significant variables is  given  in equation (1). Note that the main effect for Age was not significant, but is included in the model since interaction terms involving it are significant.

Price = 102.12 + 27.90 × Size 37.23 × Baths 6.36 × Bedrooms

10.66 × Fireplace + 11.31 × Acres 0.08 × Age

26.92 × Size : Baths 5.04 × Size : Acres 0.37 × Size : Age

0.34 × Baths : Age (1)

Model diagnostics were generally satisfactory. Price of properties commonly have large variation, and this was also evident in our data. This affected the normality of residuals, as well as indicating several outliers. However, the plot of fitted Price against observed Price was fairly linear (see Figure 1).

Discussion

From the model equation (1), note that the main effects of Baths and Bedrooms are negative, but the overall effect of these variables is only evident when the interaction terms are also included. The model interpretation is as follows.

1. The largest effect on Price Size. This can be seen from the complex combination of interaction terms as well as by considering the relative values of the variables.

? For a fixed number of Bathrooms, larger houses have a higher average Price.

? The overall effect of Size and land size is to increase mean Price.

? The overall effect of Size and Age is to increase mean Price. However, the data contains a few very old houses with low Price.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 0 200 400 600 800

Price

Figure 1: Plot of fitted Price against observed Price.

? The average Price is lower  for houses with more Bedrooms.  however,  this is most likely due   to lower Price for the few houses with six Bedrooms. In addition, only one house has seven Bedrooms. A better model may be obtained by considering Bedrooms as categorical and combining 5, 6 and 7 Bedrooms into a single level.

? A Fireplace increase the average Price of the house by $10,660.

Overall, the findings from out model are similar to those found by others. In particular, Size of property, land size and number of Bathrooms have a positive effect on average Price.

We note the presence of several outliers in the data. These tend to render the residuals non-normal. However, the fitted and obseerved values show a reasonable correspondence, as can be seen in Figure 1. Consequently, given that our interest is in understanding the effect of property characteristics on Price,     we ignore the outliers. Nonetheless, a transformed model may be investigated. In particular, one can take  the log of Price.  Also, Age shows strong positive skewness, and a log  or square-root transform will make  it more symmetric.

References

[1] A.  Kaushal and  A. Shankar.  House  price  prediction  using  multiple linear  regression.  In Proceedings    of the International Conference on Innovation and Computing (ICISS) 2021, 2021.

[2] A. S. Mark and W. B. John. Estimating price paths for residential real estate. Journal of Real Estate Research, 2012.

[3] Aminah Md Yusof and Syuhaida Ismail. Multiple regressions in analysing house price variations.

Communications of the IBIMA, 2012, 2003.

[4] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2022.

[5] I. D. Wilson, S. D. Paris, J. A. Ware, and D. H. Jenkins. Residential property price time series forecasting with neural networks. Journal of Knowledge-Based Systems, 15, 2002.