Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Semester Two 2020

ETF1100

Business Statistics

Question 1

(a)  Let us begin by looking at some statistics on covid-19 for Australia and the United States

shown in Exhibit 1. This data is for 2 October 2020.

Exhibit 1

Country

Population

Total Cases

Total

Deaths

Total Cases per Million

Total Deaths per Million

Australia

25,499,881

27,136

894

1,064.162

35.059

United States

331,002,647

7,417,845

209,794

22,410.229

633.814

(i).          Is it useful to compare total cases in Australia with those in the United States? Explain your answer and how the countries could be better compared. (2 marks)

(ii).        With reference to Exhibit 1, outline how “Total Cases per Million” is calculated for

Australia from the other data in the table. (2 marks)

(iii).       Explain what Total Deaths per Million” measures and compare this statistic for

Australia and the United States. (2 marks)

(iv).       A further important statistic, which is not shown in Exhibit 1, is the death rate per

covid-19 case. Using the numbers in Exhibit 1 calculate this figure for Australia and the United States and compare the numbers. (2 marks)

(b)  In Exhibit 2 we outline some summary statistics across all countries, for a single day (2

October 2020), on total deaths and total deaths per million persons.

Exhibit 2

total_deaths total_deaths_per_million

Mean

4899.516746 Mean 126.5259139



Standard Error

Median

Mode

Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum

Sum

Count

1399.483337 138 0

20232.0959 409337704.6 61.68438692 7.319112388 207808

0

207808

1023999

209

Standard Error

Median

Mode

Standard Deviation Sample Variance Kurtosis Skewness

Range

Minimum

Maximum

Sum

Count

13.90254442 37.485 0

200.9867531 40395.67491 7.071985387 2.496156331 1237.551

0 1237.551 26443.916

209

(i).          How many countries are included in the data? (1 mark)

(ii).        Are the two variables skewed and if so in what direction? Provide reasons for your

answers. (2 marks)

What is the mode for total deaths? Interpret what this value means in the context of the data and whether you think it is informative. (2 marks)

In Exhibit 3 we provide descriptive statistics across countries on 2 October 2020 for the variables; population and GDP per capita.

population

Exhibit 3

gdp_per_capita

Mean

37083651.23

Mean

19284.98379

Standard Error

9880221.384

Standard Error

1459.349887

Median

6871287

Median

13031.5265

Mode

#N/A

Mode

#N/A

Standard Deviation

142836703.6

Standard Deviation

19687.70634

Sample Variance

2.04023E+16

Sample Variance

387605781.1

Kurtosis

81.75228927

Kurtosis

4.107205715

(i).         What is the mean population and GDP per capita of the countries in our data? Also report the units of measurement for each of these values. (2 marks)

(ii).        The standard error for the population variable is 9880221.384 while the standard

deviation is 142836703.6. Write a formula which shows the relationship between these two values? (1 mark)

(iii).       The value of the mode for GDP per capita is “#NA” . Explain what this means and why

you think this has occurred. (2 marks)

(iv).       Some countries have missing values for GDP per capita. How can we tell this? (1 mark)

(v).         How do you think these missing values bias mean GDP per capita in terms of accurately measuring the income level of countries in our data? (2 marks)

(d)  Previously we saw that a number of countries had apparently recorded zero total deaths from covid-19. Exhibit 4 lists these countries and some additional statistics.

Exhibit 4

Location

Total Deaths

Total Deaths per Million

Population

GDP per Capita

Anguilla

0

0

15002

Bhutan

0

0

771612

8708.597

(i).         What are some of the features of these countries? (2 marks)

(ii).        Data can be accurate or inaccurate. With reference to the data in Exhibit 4, identify a

country where you think the report or zero deaths is more likely to be accurate and identify a country where you think it is more likely to be inaccurate. In both cases    provide reasons for your answers. (2 marks)

Question 2

(a)  I have created two categorical variables using data from 2 October 2020. Each variable takes

three values:

IncomeGroup:

Low:       if a country’s GDP per capita is below $5,000.

Middle:  if a country’s GDP per capita is from $5,000 to below $15,000.

High:      if a country’s GDP per capita is $15,000 or more. CovidImpact:

Mild:           if a country’s total covid-19 cases per million was less than 500.

Moderate: if a country’s total covid-19 cases per million was from 500 to less than 5,000.

Severe:       if a country’s total covid-19 cases per million was 5,000 or more.

Using these variables, I have constructed a pivot table shown in Exhibit 5. The pivot table reports the number of countries in each of the categories on 2 October 2020.

Exhibit 5

date

2020-10-02

High

Low

Middle

8

27

13

31

35

21

43

13

18

82

75

52

Grand Total

48 87 74 209

(i).         Which of the nine combinations of these two categorical variables is most common and how many countries does it include? (2 marks)

(ii).        What proportion of the countries in our data are middle income countries? (1 mark)

(iii).       If I note that 8/209 countries have high incomes and mild covid-19 impact, what sort

of probability am I describing? (1 mark)

(iv).       Calculate the following three probabilities:

P( CovidImpact=Severe | IncomeGroup=High )

P( CovidImpact=Severe | IncomeGroup= Middle )

P( CovidImpact=Severe | IncomeGroup=Low ) (3 marks)

(v).        Give an intuitive definition of the concept of statistical independence. Using the       results in the previous question, outline whether you think a country’s income level is independent from the severity with which it is affected by covid-19. (2 marks)

(b)  Let us now investigate the covid-19 death rate. This is the ratio of persons who died from

covid-19 relative to the number of persons who had the disease, expressed as a percentage.

Exhibit 6


(i).          In Exhibit 6 I used Excel to automatically construct a histogram of the covid-19 death rate across countries. Comment on three problems, or things that could be                 improved, with regard to this histogram. (3 marks)