Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Semester Two 2021

Exam Alternative Assessment Task

ETF1100

Business Statistics

SECTION A (3 marks)

The company has a dataset on more than 9,000 historical claims that they can use to investigate fraud. The data was obtained as follows:

•   Over a period of years, a simple random sample of claims was fully investigated for possible fraud.

•   The dataset shows the basic characteristics of these claims, along with a categorisation of whether fraud was detected in each case.

Question 1.

Suppose that the observations in the full sample of data were sorted by the claimants age. The claimants who were the oldest were first and the youngest were last. If you had taken the first 7,000 claims in the data what problems could this have introduced? [3 marks]

If we had taken the first 7,000 then this would have tended to be biased towards older claimants.

If age is related to fraud then this will lead to biased estimates of the overall level of fraud.

SECTION B (6 marks)

First, we will look at the amounts claimed. Exhibit 1 shows the descriptive statistics for the Claim Amount for each claim (units = $). We show the statistics for the Full Sample, then for those where fraud was not detected (Fraud=0), and those where fraud was detected (Fraud=1).

Exhibit 1

Claim Amount

Full Sample

Fraud=0

Fraud=1

Mean

8004.9457

8593.8

5703.63407

Standard Error

71.8931355

83.14785

123.902354

Median

5780

6618

4221

Mode

2472

2472

2396

Standard Deviation

6870.96804

7091.006

5345.06197

Sample Variance

47210201.8

50282360

28569687.4

Kurtosis

13.8235219

12.75142

25.3476945

Skewness

3.03227935

2.928387

3.9921298

Range

81427

81427

72311

Minimum

1898

1898

1918

Maximum

83325

83325

74229

Sum

73117174

62502711

10614463

Count

9134

7273

1861

Question 2.

For the case where Fraud=0, is the claim amount variable positively or negatively skewed? Explain how you reached your conclusion. [2 marks]

Claims are positively skewed.

This can be seen because the mean is greater than the median.

Question 3.

What is implied by the skewness you found for the  case where Fraud=0 in the previous question? {2 marks]

There are few claim amounts that is very high compared to the rest of claim amounts where Fraud=0

Question 4.

Compare the mean and median for Fraud=0 with the ones for Fraud=1. What do you learn by comparing them? [2 marks]

The fraudulent claims tend to be smaller than the genuine/non-fraudulent claims.

SECTION C (13 marks)

The pivot table in Exhibit 2 shows the number ofclaims according to two characteristics of the claim: (i) whether the accident took place on the weekend (“Weekend?”), and (ii) whether the car was reported to be driveable after the accident (“Driveable?”).

The pivot table in Exhibit 3 shows the proportion of claims where fraud was found for each combination of Weekend?” and “Driveable?” .

Exhibit 2

Driveable?

No

Yes

Weekend?

No

770

6961

7731

Yes

120

1283

1403

890

8244

9134

Exhibit 3

Driveable?

No

Yes

Weekend?

No

0.104

0.202

0.191

Yes

0.167

0.279

0.269

0.112

0.214

0.204

Question 5.

First to Exhibit 2, what proportion of claims took place on a weekday? [1 marks]

7731/9134

Question 6.

Were more cars driveable after the accident or not driveable? Explain your answer. [2 marks]

More cars were driveable.

8244 of the cars were driveable and 890 of the cars were not driveable.

Question 7.

Using Exhibit 2, Report the probabilities:

•   P( Weekend?=Yes | Driveable?=Yes )

•   P( Weekend?=Yes | Driveable?=No )

•   P( Weekend?=Yes )

Use these probabilities to decide whether the probability of a claim taking place on the weekend is independent of whether the car is driveable. Explain your answer. [5 marks]

P( Weekend?=Yes | Driveable?=Yes ) = 1283 / 8244 = 0.16

P( Weekend?=Yes | Driveable?=No ) = 120 / 890 = 0.13

P( Weekend?=Yes ) = 1403 / 9134 = 0.15

If a claim occurring on the weekend is independent of whether it is driveable then we would expect these probabilities to be equal.

We find they are somewhat different so we conclude that they are not independent.

Question 8.

Turning to Exhibit 3, what is the combination of these two characteristics that produced the lowest likelihood of fraud? What is the chance of fraud taking place for that combination? [2 marks]

Weekend=no and Driveable=no.

The probability of fraud is 0.104 (10.4%)

Question 9.

What proportion of all claims resulted in fraud? [2 marks]

20.4%

Question 10.

What are the chances of fraud taking place for a claim that took place on the weekend and for which the car was driveable? [1 marks]

27.9%