ETF1100 Business Statistics Semester One 2020
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Semester One 2020
ETF1100
Business Statistics
Question 1 (18 marks)
Exhibit 1 presents descriptive statistics for the age of the people who died in accidents. The first set covers the whole period from January 1989 to February 2020, the second set covers the period from January 1989 to December 2003, and the last column is for the period January 2004 to February 2020.
Exhibit 1
|
Full Sample |
1989-2003 |
2004-2020 |
Mean |
39.542616 |
37.64256177 |
42.07816773 |
Standard Error |
0.0963247 |
0.126064411 |
0.14748453 |
Median |
34 |
31 |
38 |
Mode |
18 |
18 |
18 |
Standard Deviation |
21.796212 |
21.56438527 |
21.8456404 |
Sample Variance |
475.07485 |
465.022712 |
477.2320043 |
Kurtosis |
-0.619564 |
-0.524647304 |
-0.694967519 |
Skewness |
0.5732953 |
0.659271235 |
0.473556606 |
Range |
110 |
108 |
110 |
Minimum |
-9 |
-9 |
-9 |
Maximum |
101 |
99 |
101 |
Sum |
2024661 |
1101459 |
923195 |
Count |
51202 |
29261 |
21940 |
(a) First, we will focus on the mean, median and mode over the three periods:
(i) Compare the mean and median for the full sample. What do they suggest about the shape of the distribution of ages? Explain how you draw that conclusion about the shape based on how the mean and median are calculated. (3 marks)
(ii) Compare the means across the three periods. What does this suggest about trends
over time? (2 marks)
(iii) What does the mode measure? Explain whether you think it is an informative
measure in our case. (2 marks)
(b) Look at the standard deviation in the table of summary statistics above.
(i) Describe in words how the standard deviation is calculated from the mean and the data values. (1 mark)
(ii) Using the full sample, interpret what the value of the standard deviation tells you
about the spread of ages. (1 mark)
(iii) If the data was normally distributed, you would expect approximately 95% of the
data to be within two standard deviations of the mean. Calculate that range for the full sample. Comment on the values you find and whether they are likely to be accurate? Why not? (3 marks)
(c) Refer back to Exhibit 1 and answer the following questions.
(i) The minimum value in Exhibit 1 for each of the samples is -9. This was not an actual age, that is not possible, but an indicator for a missing value of age. How would including this in our calculations have changed the mean, median and the mode? (2 marks)
(ii) In Exhibit 1 it appears that the average age of road fatalities is increasing over time.
How could a different proportion of missing values, i.e. -9, contribute to this result over the two periods examined. (2 marks)
(iii) We reexamine the data and find that there are only 85 observations with -9 for Age.
Is including these values likely to have a large impact on the mean, median or mode? Provide some justification for your answer. (2 marks)
Question 2 (16 marks)
(a) Exhibit 2 shows the age distribution of fatalities each year from 1989 to 2019.
Exhibit 2
Fatalities among Drivers by Age |
|
|||
Year |
17 to 25 |
26 to 64 |
65 or older |
Grand Total |
|
||||
2010 |
26.6% |
57.4% |
16.0% |
100.0% |
2011 |
23.2% |
57.7% |
19.0% |
100.0% |
2012 |
23.0% |
57.5% |
19.5% |
100.0% |
2013 |
21.5% |
54.0% |
24.4% |
100.0% |
2014 |
20.8% |
55.2% |
24.0% |
100.0% |
2015 |
20.4% |
55.3% |
24.3% |
100.0% |
2016 |
21.9% |
56.9% |
21.2% |
100.0% |
2017 |
20.8% |
54.8% |
24.4% |
100.0% |
2018 |
22.4% |
53.8% |
23.8% |
100.0% |
2019 |
18.2% |
58.4% |
23.4% |
100.0% |
(i) What does the 11.0% in the first row of the table mean? (1 mark)
(ii) Comment on the main trends for each age range that are shown in this table. (3 marks)
(b) Exhibit 3 presents some related regression model output, with:
Dependent Variable:
• The proportion of Fatalities aged 17-25 (these are the values in the second
column of Exhibit 2).
Independent Variable:
• Year = the values in the first column of Exhibit 2.
Exhibit 3
When I got this result I thought the intercept value of 9.037 looked a bit strange. But then I thought about it and realised it is probably not a mistake. Why does the value look strange? What is the interpretation of this value? (2 marks)
(ii) It has been claimed that there is a clear decline in the proportion of fatalities among
younger people (aged 17-25) over the last 30 years. Use the regression output to test this claim. Show all the steps of a formal hypothesis test, and interpret the conclusion you make. (4 marks)
(iii) The values in Exhibit 3 can be used to construct a 95% confidence interval for the
coefficient on Year. What is the interval, and explain in everyday language what this interval means. (3 marks)
(iv) If you are asked for a 99% interval instead, will this interval be wider or narrower than
the one in the output above? Explain your reasoning. (3 marks)
2022-06-14