Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

2017/2018

SPRING

BS2506

INFERENTIAL STATISTICS, STATISTICAL MODELLING & SURVEY METHODS

1.     a)   Explain the advantages and disadvantages of non-parametric statistical tests.

(10 marks)

b)   A random sample of 500 adults were questioned regarding their political affiliation and opinion on a tax reform bill. The responses are shown in the following contingency table:

 

Favour

Indifferent

Opposed

Labour

138

83

64

Conservative

64

67

84

(i)        Using α = 0.01, test to see whether there is any evidence that the political affiliation and their opinion on a tax reform  are associated?

(ii)       Calculate Cramer’s contingency coefficient and interpret it.

(iii)      Construct a  99% confidence interval estimate for the percentage of people who

are in favour of the tax reform bill and interpret its meaning.

(23 marks)

2       a) i) What is the nature of multicollinearity and what are its practical consequences?

ii) How can you detect and deal with multicollinearity

(11 marks)

b) Your company has 20 retail outlets across Britain selling a similar range of products.   Using last year’s data, a regression equation was developed relating Sales (in £10,000)

to three independent variables. These variables are:

X1:   Floor space of the outlet (in sq. meter)

X2 :   Size of population in the catchment areas (in thousands)

X3  : 1 if store is situated on a prime site location, 0 otherwise

Part of the regression results obtained are shown below.

Variables in equation

Variable

ˆ

SE( ˆ )

Constant

16.39

2.635

X1

0.1751

0.0467

X2

0.2069

0.0398

X3

1.552

0.1829

For this model:

SSR = Regression Sum of Squares = 7500

SST = Total Sum of Squares = 8580

i)  Use the above results to write the regression model and interpret the meaning of the slope coefficients.

ii)      Explain what happens if you would add another variable for location as (X4 = 0 if store is located on a prime site, 1 representing otherwise) in the model.

iii)     Predict the sales of a new store with the size of 100 sq. meters, in a catchment area of 75,000 and prime site location.

(9 marks)

c)  At α=0.01 level of significance,

i)       Conduct a test to determine whether there is a significant relationship between sales and the three explanatory variables

ii)       Determine  which  of  the  explanatory  variables  have  significant  regression coefficients. Which variable(s) would you consider eliminating?

(13 marks)

Q3 (a)     (i)    State the assumptions behind the classical linear regression model and explain

briefly what each means and how to check them.

(ii)    For the following model, outline the method you would use to estimate the

parameters.

Y = 0 XX

(14 marks)

A company has opened several outdoor ice-skating rinks and would like to know what factors affects the attendance at the rinks. The manager believes that the     following variables affects attendance.

X1:   Temperature

X2 :   Wind speed

X3  : 1 if weekend, 0 otherwise

X4 :   X1 X2

The following least square regression was found from 30 days of data:

Ŷ = 250 + 4.8X1  -30X2   + 1.3X3   + 35X4                           R2 = 0.72         (Model 1)

i)   What is the predicted attendance on a weekend if the temperature is 28 degrees Fahrenheit and wind speed is 12 miles per hour?

ii)   At the 5% level of significance, test to determine whether Model 1 is significant.

iii)  The coefficient of determination for the model which involves only the independent variables X1 and X2  is 0.52. Do the variables X3 and X4 in Model 1 contribute         significantly to predicting the variation in attendance? Use a 5% significance level.

iv)  Compute the adjusted coefficient of determination for Model 1. Explain the difference between R2 and the adjusted R2 .

(19 marks)

Propose a time series regression model for quarterly data ( with 20 observations) that will account for both the linear trend and seasonal variations in the data.

From your proposed model, write down the forecast for each quarter of Year 6.

Explain, with the help of a diagram, what the quarterly dummy variables do.       (12 marks)

The following data represent the annual revenues (in billions of pounds) of a company over the past 20 years.

Year

Revenues

1

5.2

2

4.3

3

5.0

4

6.0

5

7.1

6

8.2

7

12.7

8

15.1

9

17.8

10

20.1

11

21.0

12

22.4

13

25.0

14

34.3

15

34.6

16

35.6

17

37.1

18

40.1

19

45.1

20

47.2

Forecast the revenues for the next two years using:

i)         Moving average, with K=3

ii)        Exponential smoothing with α= 1

iii)       Holt  model with α = 0.9 and   y =0.5 (You may use L19 = 44.80,  T19 = 3.55)

iv)       The estimated linear trend equation

Yt = -2.97 + 2.4t,                                t=1,2 …… 20

v)        The estimated quadratic trend model

Yt = 1.32 + 1.22t + 0.0558t2                       t =1,2…… .20

(14 marks)

For the five forecasting methods in (b), the respective mean absolute errors (MAE) are:

MAE (Moving average) = 4.76

MAE (Exponential smoothing) = 2.48

MAE (Holt) = 1.8

MAE (Linear Trend Model) = 1.76

MAE (Quadratic trend model) = 1.36

Which of the five methods would you select for the purpose of forecasting? Discuss.

(7 marks)

Q5       a)         Give a brief account of the sources of error which can affect the survey process

from survey design through to presentation of results.

(16 marks)

b)        Explain the cluster sampling method. Discuss two advantages of this technique.

(17 marks)