STAT2401: Assignment 2
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
STAT2401: Assignment 2
Overview
This study concerns a problem of interest to real estate appraisers, tax assessors, real estate investors, and homebuyers-namely, the relationship between the appraised value of a property and its sale price. The sale price for any given property will vary depending on the price set by the seller, the strength of appeal of the property to a specific buyer, and the state of the money and real estate markets. Therefore, we can think of the sale price of a specific property as possessing a relative frequency distribution. The mean of this distribution might be regarded as a measure of the fair value of the property. Presumably, this is the value that a property appraiser or a tax assessor would like to attach to a given property.
The data for the study were supplied by the property appraiser’s office of Hillsborough County, Florida, and consist of the appraised land value (Land in $1,000) and improvement value (Imp in $1,000) and sale prices (Sales in $1,000) for residential properties sold in the city of Tampa, Florida, during the period May 2008 to June 2009. Four neighborhoods (Nbhd) (Hyde Park (HYDEPARK), Cheval (CHEVAL), Hunter’s Green (HUNTERSGREEN), and Davis Isles (DAVISISLES)), each relatively homogeneous but differing sociologically and in property types and values, were identified within the city and surrounding area. The subset of sales and appraisal data perti- nent to these four neighborhoods-a total of 176 observations-was used to develop a prediction equation relating sale prices (Sales) to appraised land (Land) and improvement values (Imp).
The purpose of case study is to examine the relationship between the mean sale price (Sales) of a property and the following independent variables:
1. Land: Appraised land value of the property
2. Imp: Appraised value of the improvements on the property
3. Nbhd: Neighborhood in which the property is listed
These data are saved in the TamSales.txt file it can be read into R as follows. Also we have be shown the first 10 data points.
TamSales = read.table("TamSales.txt" ,header=T)
str(TamSales)
## 'data.frame': 176 obs. of 7 variables:
## $ Folio : int 129295710 129295830 129453028 129453080 129453314 129453326 129453554 129453556 129453584 129453728 ... ## $ Sales : num 378 273 321 395 272 ...
## $ lnSales: num 5.93 5.61 5.77 5.98 5.61 ...
## $ Land : num 81.8 60.5 115.6 119.6 84.7 ...
## $ Imp : num 243 134 255 202 134 ...
## $ Totval : num 325 195 371 322 218 ...
## $ Nbhd : Factor w/ 4 levels "CHEVAL","DAVISISLES",..: 1 1 1 1 1 1 1 1 1 1 ...
head(TamSales,10)
## Folio Sales lnSales Land Imp Totval Nbhd
## 1 129295710 378.0 5.93489 81.84 243.30 325.14 CHEVAL
## 2 129295830 273.0 5.60947 60.48 134.47 194.95 CHEVAL
## 3 129453028 321.2 5.77206 115.58 255.42 371.00 CHEVAL
## 4 129453080 395.0 5.97889 119.61 202.05 321.66 CHEVAL
## 5 129453314 272.0 5.60580 84.69 133.58 218.26 CHEVAL
## 6 129453326 350.0 5.85793 78.69 154.70 233.39 CHEVAL
## 7 129453554 315.0 5.75257 59.85 148.29 208.15 CHEVAL
## 8 129453556 220.0 5.39363 59.25 176.31 235.56 CHEVAL
## 9 129453584 280.0 5.63479 85.68 147.06 232.74 CHEVAL
## 10 129453728 274.9 5.61641 57.96 129.07 187.03 CHEVAL
Contact the lecturer immediately if you have difficulty accessing this data set.
Aims
The objectives of the study are twofold:
1. To determine whether the data indicate that appraised value of land (Land) and appraised value of improvements (Imp) are related to sale prices (Sales). That is, do the data supply sufficient evidence to indicate that these variables contribute information for the prediction of sale price (Sales)?
2. To acquire the prediction equation relating appraised value of land (Land) and appraised value of improvements (Imp) to sale price (Sales) and to determine whether this relation- ship is the same for a variety of neighborhoods (Nbhd). In other words, do the appraisers use the same appraisal criteria for various types of neighborhoods (Nbhd)?
We want to relate sale price (Sales) to three independent variables: the qualitative factor, neighborhood (Nbhd) with four levels, and the two quantitative factors, appraised land value (Land) and appraised improvement value (Imp). We consider the following four models as candidates for this relationship.
1. Model 1 will assume that the response planes are identical for all four neighborhoods, that is, a model is appropriate for relating Sales to Land and Imp and that the rela- tionship between the sale price and the appraised value of a property is the same for all neighborhoods:
Sales = β0 + β1Land + β2 Imp + i ,
where i is a normal random variable with means 0 and variance σ 2 .
2. Model 2 will assume that the relationship between Sales and (Land, Imp) is linear, but that Sales-intercepts differ depending on the neighborhood. We define 3 variables to indicate the 4 neighborhoods
HYDEPARK =
CHEVAL =
HUNTERSGREEN =
In R, we could create these three variables by
This model would be appropriate if the appraiser’s procedure for establishing appraised values produced a relationship between mean sale price (Sales) and (Land, Imp) that differed in at least two neighborhoods, but the differences remained constant for different values of Land and Imp. Model 2 is given by
Sales = β0(′) + β1(′)Land + β2(′)Imp
+ β3(′)HYDEPARK + β4(′)CHEVAL + β5(′)HUNTERSGREEN + i′ , where i′ is a normal random variable with means 0 and variance (σ′ )2 .
3. Model 3 is similar to Model 2 except that we will add interaction terms (i) between the neighborhood dummy variables and Land and (ii) between the neighborhood dummy variables and Imp. These interaction terms allow the change in Sales for increases in Land or Imp to vary depending on the neighborhood. The equation of Model 3 is
Sales = β0(′′) + β1(′′)Land + β2(′′)Imp
+ β3(′′)HYDEPARK + β4(′′)CHEVAL + β5(′′)HUNTERSGREEN
+ β6(′′)(HYDEPARK X Land) + β7(′′)(CHEVAL X Land) + β8(′′)(HUNTERSGREEN X Land) + β9(′′)(HYDEPARK X Imp) + β1(′′)0 (CHEVAL X Imp) + β1(′′)1 (HUNTERSGREEN X Imp) + i′′ ,
where i′′ is a normal random variable with means 0 and variance (σ′′ )2 . Note that for Model 3, the change in sale price (Sales) for every $1,000 increase in appraised land value Land (holding Imp fixed) is β 1(′′) + β8(′′) ( X $1,000) in the neighborhood Hunter’s Green (HUNTERSGREEN) and β 1(′′) + β7(′′) ( X $1,000) in the neighborhood Cheval (CHEVAL).
4. Model 4 differs from the previous three models by the addition of terms for interaction between Land and Imp. Thus, Model 4 will trace differently for each neighborhood. The interaction model follows:
Sales = β0(′′′) + β1(′′′)Land + β2(′′′)Imp
+ β3(′′′)HYDEPARK + β4(′′′)CHEVAL + β5(′′′)HUNTERSGREEN
+ β6(′′′)(HYDEPARK X Land) + β7(′′′)(CHEVAL X Land) + β8(′′′)(HUNTERSGREEN X Land) + β9(′′′)(HYDEPARK X Imp) + β1(′′)0(′)(CHEVAL X Imp) + β1(′′)1(′)(HUNTERSGREEN X Imp) + + β1(′′)2(′)(Land X Imp) + i′′′ ,
where i′′′ is a normal random variable with means 0 and variance (σ′′′ )2 . Unlike Model 1-Model 3, Model 4 allows the change in Sales for increases in Land to depend on Imp, and vice versa. For example, the change in sale price (Sales) for a $1,000 increase in appraised land value (Land) in the neighborhood Hunter’s Green (HUNTERSGREEN) is (β1(′′′) + β8(′′′)) + β1(′′)2(′)Imp ( X $1,000). Model 4 also allows for these sale price (Sales) changes to vary from neighborhood to neighborhood (due to the neighborhood interaction terms).
Lastly, please state and indiciate your answers properly, the markers have no responability to pick the right answers for you, in parituclar for those who leave a large chunk of R-output as your answers.
Instructions & Questions
1. Produce the scatterplots of (i) Sales against Land by Nbhd and (ii) Sales against Imp by Nbhd.
[4 Marks]
2. Comment on the plots produced in part (1).
[2 Marks]
3. Fit Model 1 using R (Report your R-code and R-output). Report also the fitted line.
[4 Marks]
4. Fit Model 2 using R (Report your R-code and R-output). Report also the fitted line.
[4 Marks]
5. Fit Model 3 using R (Report your R-code and R-output). Report also the fitted line.
[4 Marks]
6. Fit Model 4 using R (Report your R-code and R-output). Report also the fitted line.
[4 Marks]
7. Compare Model 1 and Model 2 using F-test. Report your R-code, R-output and the p- value, which model do you prefer?
[4 Marks]
8. Compare Model 2 and Model 3 using F-test. Report your R-code, R-output and the p- value, which model do you prefer?
[4 Marks]
9. Compare Model 3 and Model 4 using F-test. Report your R-code, R-output and the p- value, which model do you prefer?
[4 Marks]
10. Comment on the outcomes in part (7), (8), and (9).
[2 Marks]
11. Based on the above analysis, give your comments on the aims of this analysis.
[2 Marks] [Total: 38 Marks]
2022-05-11