Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MIS770 FOUNDATION SKILLS IN DATA ANALYSIS

SAMPLE EXAMINATION PAPER 2

QUESTION 1 (2+4+7+1+2+5 = 21 Marks)

In the preliminary comment above it is mentioned that the housing data was collected on a sample of 120 houses.

(a)     Using the Baycoast case study, describe an example of the use of:

(i)      Stratified sampling.

(ii)     Non-random sampling.

(b)     One of the objectives of the survey is to estimate the average age of all the houses sold

over the past 12 months to within plus or minus 2 years, with 95% confidence. Assuming that the population standard deviation for age is approximately 10 years, what sized sample would be required?

(c)     The sample of 120 houses was taken from a population of approximately 5000 houses    that were sold in the city of Baycoast over the past 12 months. The REIV has been asked to undertake a similar survey in another city, where the population size is 50000, but they are concerned that a much larger sample would be required for the same level of              accuracy. What advice would you give?

Read the following Newspaper article from the Melbourne Herald Sun.

(d)    What type of data is Number of Bedrooms?

(e)     Would a column chart be an appropriate graph to display the Number of Bedrooms data?

Explain.

(f)      Below is a partially completed frequency cross-tabulation showing number of bedrooms

versus the age of a house (houses built from 2004 onwards are considered new’ and those built before 2004 are considered old’).

i.    Complete the frequency cross-tabulation.

Frequency

Bedrooms

New

12

19

Old

Total

32 39 24 120

ii.    Complete the row percentage cross-tabulation below.

Row Percent

Bedrooms

Age

2

3

4

5

Total

New

Old

Total

iii.    Using your results from (ii) does it seem that Bedrooms and Age of house are related? Explain.

QUESTION 2 (3+7+6 = 16 Marks)

(a)   It can be assumed that residential lot sizes in Baycoast are normally distributed with a

mean of 1150 square metres and standard deviation of 350 square metres.

Any residential lot size in Baycoast larger than 1400 square metres can be considered for redevelopment as two units on the one lot. What proportion of Baycoast lots are suitable  for development as two units?

(b)    The REIV recently released a report stating that vacancy rates of rental properties in

Baycoast is currently only 2%. The Baycoast council believes this figure is too low and so conducts a random sample of 20 rental properties and found three (3) were vacant.

i.         Explain why the Binomial distribution would be suitable in this situation for calculating probabilities.

ii.         Using the Binomial distribution calculate the probability of finding 3 or more vacant properties from a sample of 20 (assume the REIV vacancy rate of 2% is correct).

iii.         Based on your answer in part (ii) above, comment on the validity of the REIV vacancy rate of 2%.

(c)  On average, 2.5 customers per hour return a product for a refund.

i.         Explain why the Poisson distribution would be suitable in this situation for calculating probabilities.

ii.         Using the Poisson distribution, calculate the probability of less than 2 customers returning a product in a given hour.

iii.         Using the Poisson distribution, calculate the probability of at least 4 customers returning a product in a given hour.

QUESTION 3 (6+10 = 16 Marks)

(a)  We wish to estimate the mean Lot Size (square metres) of all houses in the Baycoast region.

Assume the random sample of 120 houses sold are representative of all houses in Baycoast.

i.    Calculate the 95% confidence interval estimate of the mean lot size (square metres)

ii.    Suppose that the mean lot size for Melbourne overall is 1000 square metres. From your confidence interval in part (a), what can we say about the lot sizes of Baycoast houses compared to Melbourne overall?

(b)     The Baycoast council has reviewed the cost of the basket in 2016 and deemed that up to

$190 is acceptable, but that anything over $190 is excessively expensive. You wish to estimate the proportion of all supermarkets in the Baycoast falling into the excessive” category. From the sample of 150 supermarkets, we have calculated that the sample  proportion of supermarkets classified as excessive” is, = 10.67%.

i.    Calculate the 95% confidence interval estimate of the proportion of all supermarkets in the Baycoast chain in the “excessive” category.

ii.    Provide a plain language interpretation of the interval you have just constructed.

iii.    The Baycoast council has publicly admitted that some supermarkets are excessively  priced, but that the proportion is only 5%. Do you have any evidence to contradict this claim? Explain your answer. [Note: No calculations required]

iv.     If a 99% confidence interval was used instead of a 95% confidence interval would have come to the same conclusion in part (iii)? Explain your answer. [Note: No     calculations required]

you

QUESTION 4 (4+8+2 = 14 Marks)

The Real Estate Institute of Victoria (REIV) is also interested in the price of houses in Baycoast and affordability.

The senior manager at REIV also believes that housing affordability in Baycoast has got so bad that currently less than 20% of houses are worth below $500,000.

You are to check this claim by performing a hypothesis test:

(a)   Write down the null and alternative hypotheses in both symbols and words for the above

situation.

(b)   Of the sample of 120 randomly selected houses, 15 were worth less than $500,000. Using this

information, conduct a hypothesis test as to whether the true proportion of all houses in Baycoast valued under $500,000 is less than 20%. (Use α = 5%)

(c)   Could your conclusion in part (a) above have been different if α = 10% was used? What about if α = 1% was used? (No calculations required)

QUESTION 5 (2+6+10 = 18 Marks)

The REIV would like to develop a linear regression model whereby they can predict the price of a house on the basis of readily available variables. The new graduate analyst has been given the task of constructing the regression model.

(a)    A partial correlation coefficient matrix is given below:

Correlation Coefficient, r

Price($'000)

Price($'000)

1.000

Lot Size(sq m)

0.411

Age

-0.364

Area (sq m)

0.568

Bathrooms

0.131

Bedrooms

0.540

Based on this table only, the graduate analyst has decided to exclude the variable           Bathrooms from his analysis as he has concluded that there is no relationship with Price.

You believe he should have used more information before making that decision: what extra information would you require? Explain.

(b)     Next, the graduate analyst decided to use the variable Area to construct a simple linear

regression model for Price (in $’000). Below is part of his output:

Indep. X Variables

Coefficient

Standard Error

-Statistic

-Value

95% Conf. Lower

95% Conf. Upper

Intercept

365.622

73.729

4.9590

0.00000

219.617

511.626

Area (sq m)

1.96

0.262

7.4923

0.00000

1.442

2.478

(i)      Write down the regression equation in full.

(ii)      Interpret, from a practical point of view, the coefficients b0 and b1 in the regression

model equation.


(c)     Lastly, the graduate analyst has produced a multiple regression model for price using the independent variables Area, Street Appeal and Bedrooms.

Analysis of

Variance, ANOVA

Degrees Freedom

Sum of Squares, SS

Mean Sq., MS

F-Ratio

-Value

Regression

3

10189947

3396649

165.878 0.00000

Error

116

2375301

20476

Total

119

12565249