Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Semester One 2020

ETF1100

Business Statistics

Question 1 (18 marks)

Exhibit 1 presents descriptive statistics for the age of the people who died in accidents. The first set  covers the whole period from January 1989 to February 2020, the second set covers the period from January 1989 to December 2003, and the last column is for the period January 2004 to February        2020.

Exhibit 1

Full Sample

1989-2003

2004-2020

Mean

39.542616

37.64256177

42.07816773

Standard Error

0.0963247

0.126064411

0.14748453

Median

34

31

38

Mode

18

18

18

Standard Deviation

21.796212

21.56438527

21.8456404

Sample Variance

475.07485

465.022712

477.2320043

Kurtosis

-0.619564

-0.524647304

-0.694967519

Skewness

0.5732953

0.659271235

0.473556606

Range

110

108

110

Minimum

-9

-9

-9

Maximum

101

99

101

Sum

2024661

1101459

923195

Count

51202

29261

21940

(a)  First, we will focus on the mean, median and mode over the three periods:

(i)          Compare the mean and median for the full sample. What do they suggest about the shape of the distribution of ages? Explain how you draw that conclusion about the   shape based on how the mean and median are calculated. (3 marks)

(ii)         Compare the means across the three periods. What does this suggest about trends

over time? (2 marks)

(iii)        What does the mode measure? Explain whether you think it is an informative

measure in our case. (2 marks)

(b)  Look at the standard deviation in the table of summary statistics above.

(i)           Describe in words how the standard deviation is calculated from the mean and the data values. (1 mark)

(ii)          Using the full sample, interpret what the value of the standard deviation tells you

about the spread of ages. (1 mark)

(iii)         If the data was normally distributed, you would expect approximately 95% of the

data to be within two standard deviations of the mean. Calculate that range for the full sample. Comment on the values you find and whether they are likely to be         accurate? Why not? (3 marks)

(c)   Refer back to Exhibit 1 and answer the following questions.

(i)          The minimum value in Exhibit 1 for each of the samples is -9. This was not an actual age, that is not possible, but an indicator for a missing value of age. How would       including this in our calculations have changed the mean, median and the mode? (2 marks)

(ii)          In Exhibit 1 it appears that the average age of road fatalities is increasing over time.

How could a different proportion of missing values, i.e. -9, contribute to this result over the two periods examined. (2 marks)

(iii)        We reexamine the data and find that there are only 85 observations with -9 for Age.

Is including these values likely to have a large impact on the mean, median or mode? Provide some justification for your answer. (2 marks)

Question 2 (16 marks)

(a)  Exhibit 2 shows the age distribution of fatalities each year from 1989 to 2019.

Exhibit 2

Fatalities among Drivers by Age

Year

17 to 25

26 to 64

65 or older

Grand Total


2010

26.6%

57.4%

16.0%

100.0%

2011

23.2%

57.7%

19.0%

100.0%

2012

23.0%

57.5%

19.5%

100.0%

2013

21.5%

54.0%

24.4%

100.0%

2014

20.8%

55.2%

24.0%

100.0%

2015

20.4%

55.3%

24.3%

100.0%

2016

21.9%

56.9%

21.2%

100.0%

2017

20.8%

54.8%

24.4%

100.0%

2018

22.4%

53.8%

23.8%

100.0%

2019

18.2%

58.4%

23.4%

100.0%

(i)          What does the 11.0% in the first row of the table mean? (1 mark)

(ii)         Comment on the main trends for each age range that are shown in this table. (3 marks)

(b)  Exhibit 3 presents some related regression model output, with:

Dependent Variable:

•   The proportion of Fatalities aged 17-25 (these are the values in the second

column of Exhibit 2).

Independent Variable:

•    Year = the values in the first column of Exhibit 2.

Exhibit 3

When I got this result I thought the intercept value of 9.037 looked a bit strange. But        then I thought about it and realised it is probably not a mistake. Why does the value look strange? What is the interpretation of this value? (2 marks)

(ii)          It has been claimed that there is a clear decline in the proportion of fatalities among

younger people (aged 17-25) over the last 30 years. Use the regression output to test this claim. Show all the steps of a formal hypothesis test, and interpret the conclusion you make. (4 marks)

(iii)        The values in Exhibit 3 can be used to construct a 95% confidence interval for the

coefficient on Year. What is the interval, and explain in everyday language what this interval means. (3 marks)

(iv)         If you are asked for a 99% interval instead, will this interval be wider or narrower than

the one in the output above? Explain your reasoning. (3 marks)