闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Semester One 2020

ETF1100

Business Statistics

Question 1 (18 marks)

Exhibit 1 presents descriptive statistics for the age of the people who died in accidents. The first set covers the whole period from January 1989 to February 2020, the second set covers the period from January 1989 to December 2003, and the last column is for the period January 2004 to February 2020.

Exhibit 1

	Full Sample	1989-2003	2004-2020
Mean	39.542616	37.64256177	42.07816773
Standard Error	0.0963247	0.126064411	0.14748453
Median	34	31	38
Mode	18	18	18
Standard Deviation	21.796212	21.56438527	21.8456404
Sample Variance	475.07485	465.022712	477.2320043
Kurtosis	-0.619564	-0.524647304	-0.694967519
Skewness	0.5732953	0.659271235	0.473556606
Range	110	108	110
Minimum	-9	-9	-9
Maximum	101	99	101
Sum	2024661	1101459	923195
Count	51202	29261	21940

(a) First, we will focus on the mean, median and mode over the three periods:

(i) Compare the mean and median for the full sample. What do they suggest about the shape of the distribution of ages? Explain how you draw that conclusion about the shape based on how the mean and median are calculated. (3 marks)

(ii) Compare the means across the three periods. What does this suggest about trends

over time? (2 marks)

(iii) What does the mode measure? Explain whether you think it is an informative

measure in our case. (2 marks)

(b) Look at the standard deviation in the table of summary statistics above.

(i) Describe in words how the standard deviation is calculated from the mean and the data values. (1 mark)

(ii) Using the full sample, interpret what the value of the standard deviation tells you

about the spread of ages. (1 mark)

(iii) If the data was normally distributed, you would expect approximately 95% of the

data to be within two standard deviations of the mean. Calculate that range for the full sample. Comment on the values you find and whether they are likely to be accurate? Why not? (3 marks)

(i) The minimum value in Exhibit 1 for each of the samples is -9. This was not an actual age, that is not possible, but an indicator for a missing value of age. How would including this in our calculations have changed the mean, median and the mode? (2 marks)

(ii) In Exhibit 1 it appears that the average age of road fatalities is increasing over time.

How could a different proportion of missing values, i.e. -9, contribute to this result over the two periods examined. (2 marks)

(iii) We reexamine the data and find that there are only 85 observations with -9 for Age.

Is including these values likely to have a large impact on the mean, median or mode? Provide some justification for your answer. (2 marks)

Question 2 (16 marks)

(a) Exhibit 2 shows the age distribution of fatalities each year from 1989 to 2019.

Exhibit 2

Fatalities among Drivers by Age
Year	17 to 25	26 to 64	65 or older	Grand Total

2010	26.6%	57.4%	16.0%	100.0%
2011	23.2%	57.7%	19.0%	100.0%
2012	23.0%	57.5%	19.5%	100.0%
2013	21.5%	54.0%	24.4%	100.0%
2014	20.8%	55.2%	24.0%	100.0%
2015	20.4%	55.3%	24.3%	100.0%
2016	21.9%	56.9%	21.2%	100.0%
2017	20.8%	54.8%	24.4%	100.0%
2018	22.4%	53.8%	23.8%	100.0%
2019	18.2%	58.4%	23.4%	100.0%

(i) What does the 11.0% in the first row of the table mean? (1 mark)

(ii) Comment on the main trends for each age range that are shown in this table. (3 marks)

(b) Exhibit 3 presents some related regression model output, with:

Dependent Variable:

• The proportion of Fatalities aged 17-25 (these are the values in the second

column of Exhibit 2).

Independent Variable:

• Year = the values in the first column of Exhibit 2.

Exhibit 3

When I got this result I thought the intercept value of 9.037 looked a bit strange. But then I thought about it and realised it is probably not a mistake. Why does the value look strange? What is the interpretation of this value? (2 marks)

(ii) It has been claimed that there is a clear decline in the proportion of fatalities among

younger people (aged 17-25) over the last 30 years. Use the regression output to test this claim. Show all the steps of a formal hypothesis test, and interpret the conclusion you make. (4 marks)

(iii) The values in Exhibit 3 can be used to construct a 95% confidence interval for the

coefficient on Year. What is the interval, and explain in everyday language what this interval means. (3 marks)

(iv) If you are asked for a 99% interval instead, will this interval be wider or narrower than

the one in the output above? Explain your reasoning. (3 marks)