MIS770 FOUNDATION SKILLS IN DATA ANALYSIS
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
MIS770 FOUNDATION SKILLS IN DATA ANALYSIS
SAMPLE EXAMINATION PAPER 2
QUESTION 1 (2+4+7+1+2+5 = 21 Marks)
In the preliminary comment above it is mentioned that the housing data was collected on a sample of 120 houses.
(a) Using the Baycoast case study, describe an example of the use of:
(i) Stratified sampling.
(ii) Non-random sampling.
(b) One of the objectives of the survey is to estimate the average age of all the houses sold
over the past 12 months to within plus or minus 2 years, with 95% confidence. Assuming that the population standard deviation for age is approximately 10 years, what sized sample would be required?
(c) The sample of 120 houses was taken from a population of approximately 5000 houses that were sold in the city of Baycoast over the past 12 months. The REIV has been asked to undertake a similar survey in another city, where the population size is 50000, but they are concerned that a much larger sample would be required for the same level of accuracy. What advice would you give?
Read the following Newspaper article from the Melbourne Herald Sun.
(d) What type of data is Number of Bedrooms?
(e) Would a column chart be an appropriate graph to display the Number of Bedrooms data?
Explain.
(f) Below is a partially completed frequency cross-tabulation showing number of bedrooms
versus the age of a house (houses built from 2004 onwards are considered ‘new’ and those built before 2004 are considered ‘old’).
i. Complete the frequency cross-tabulation.
Frequency |
Bedrooms |
|
|
|
|
|
New |
12
19 |
Old |
||
Total |
32 39 24 120
|
|
|
ii. Complete the row percentage cross-tabulation below.
Row Percent |
Bedrooms |
||||
Age |
2 |
3 |
4 |
5 |
Total |
New |
|
|
|
|
|
Old |
|
|
|
|
|
Total |
|
|
|
|
|
iii. Using your results from (ii) does it seem that Bedrooms and Age of house are related? Explain.
QUESTION 2 (3+7+6 = 16 Marks)
(a) It can be assumed that residential lot sizes in Baycoast are normally distributed with a
mean of 1150 square metres and standard deviation of 350 square metres.
Any residential lot size in Baycoast larger than 1400 square metres can be considered for redevelopment as two units on the one lot. What proportion of Baycoast lots are suitable for development as two units?
(b) The REIV recently released a report stating that vacancy rates of rental properties in
Baycoast is currently only 2%. The Baycoast council believes this figure is too low and so conducts a random sample of 20 rental properties and found three (3) were vacant.
i. Explain why the Binomial distribution would be suitable in this situation for calculating probabilities.
ii. Using the Binomial distribution calculate the probability of finding 3 or more vacant properties from a sample of 20 (assume the REIV vacancy rate of 2% is correct).
iii. Based on your answer in part (ii) above, comment on the validity of the REIV vacancy rate of 2%.
(c) On average, 2.5 customers per hour return a product for a refund.
i. Explain why the Poisson distribution would be suitable in this situation for calculating probabilities.
ii. Using the Poisson distribution, calculate the probability of less than 2 customers returning a product in a given hour.
iii. Using the Poisson distribution, calculate the probability of at least 4 customers returning a product in a given hour.
QUESTION 3 (6+10 = 16 Marks)
(a) We wish to estimate the mean Lot Size (square metres) of all houses in the Baycoast region.
Assume the random sample of 120 houses sold are representative of all houses in Baycoast.
i. Calculate the 95% confidence interval estimate of the mean lot size (square metres)
ii. Suppose that the mean lot size for Melbourne overall is 1000 square metres. From your confidence interval in part (a), what can we say about the lot sizes of Baycoast houses compared to Melbourne overall?
(b) The Baycoast council has reviewed the cost of the basket in 2016 and deemed that up to
$190 is acceptable, but that anything over $190 is excessively expensive. You wish to estimate the proportion of all supermarkets in the Baycoast falling into the “excessive” category. From the sample of 150 supermarkets, we have calculated that the sample proportion of supermarkets classified as “excessive” is, = 10.67%.
i. Calculate the 95% confidence interval estimate of the proportion of all supermarkets in the Baycoast chain in the “excessive” category.
ii. Provide a plain language interpretation of the interval you have just constructed.
iii. The Baycoast council has publicly admitted that some supermarkets are excessively priced, but that the proportion is only 5%. Do you have any evidence to contradict this claim? Explain your answer. [Note: No calculations required]
iv. If a 99% confidence interval was used instead of a 95% confidence interval would have come to the same conclusion in part (iii)? Explain your answer. [Note: No calculations required]
you
QUESTION 4 (4+8+2 = 14 Marks)
The Real Estate Institute of Victoria (REIV) is also interested in the price of houses in Baycoast and affordability.
The senior manager at REIV also believes that housing affordability in Baycoast has got so bad that currently less than 20% of houses are worth below $500,000.
You are to check this claim by performing a hypothesis test:
(a) Write down the null and alternative hypotheses in both symbols and words for the above
situation.
(b) Of the sample of 120 randomly selected houses, 15 were worth less than $500,000. Using this
information, conduct a hypothesis test as to whether the true proportion of all houses in Baycoast valued under $500,000 is less than 20%. (Use α = 5%)
(c) Could your conclusion in part (a) above have been different if α = 10% was used? What about if α = 1% was used? (No calculations required)
QUESTION 5 (2+6+10 = 18 Marks)
The REIV would like to develop a linear regression model whereby they can predict the price of a house on the basis of readily available variables. The new graduate analyst has been given the task of constructing the regression model.
(a) A partial correlation coefficient matrix is given below:
Correlation Coefficient, r |
|
|
Price($'000) |
Price($'000) |
1.000 |
Lot Size(sq m) |
0.411 |
Age |
-0.364 |
Area (sq m) |
0.568 |
Bathrooms |
0.131 |
Bedrooms |
0.540 |
Based on this table only, the graduate analyst has decided to exclude the variable Bathrooms from his analysis as he has concluded that there is no relationship with Price.
You believe he should have used more information before making that decision: what extra information would you require? Explain.
(b) Next, the graduate analyst decided to use the variable Area to construct a simple linear
regression model for Price (in $’000). Below is part of his output:
Indep. X Variables |
Coefficient |
Standard Error |
-Statistic |
-Value |
95% Conf. Lower |
95% Conf. Upper |
Intercept |
365.622 |
73.729 |
4.9590 |
0.00000 |
219.617 |
511.626 |
Area (sq m) |
1.96 |
0.262 |
7.4923 |
0.00000 |
1.442 |
2.478 |
(i) Write down the regression equation in full.
(ii) Interpret, from a practical point of view, the coefficients b0 and b1 in the regression
model equation.
(c) Lastly, the graduate analyst has produced a multiple regression model for price using the independent variables Area, Street Appeal and Bedrooms.
Analysis of Variance, ANOVA |
|
|
|
|
|
|
Degrees Freedom |
Sum of Squares, SS |
Mean Sq., MS |
F-Ratio |
-Value |
Regression |
3 |
10189947 |
3396649 |
165.878 0.00000 |
|
Error |
116 |
2375301 |
20476 |
|
|
Total |
119 |
12565249 |
|
|
|
2022-06-11