Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ACTSC 432 Fall 2022 Assignment

Instructions

• Clearly label your code with the question it is being used to answer.  Ensure your code is well- commented so that it is easy for the TA to read and grade your submission.

• You are encouraged to submit your answers in full as a PDF file generated by an R Markdown file (as was done with Problem Set 12). If you prefer not to use R Markdown, be sure to submit your R code and all solutions in a format that is easy for the TA to grade, e.g., you can copy your R code and its output to a Word file and then save it as a PDF file to submit along with your answers. Note that Crowdmark only accepts PDF, JPG, and PNG file formats.

• Perform all hypothesis tests at a 5% significance level and be sure to clearly state your null hypothesis and alternative hypothesis.

Question [53 marks]

The provided dataset carauto.csv contains to the 2004–2005 claim experience of a group of automobile insurance policies.  Download the file and use the command read.csv in R to read it.  Note that each row has the following information aggregated among all policies having a given set of rating factors:

• duration: total number of policy years collected for the tariff cell;

• number: total number of claims recorded for the tariff cell;

• severity: average claim size recorded within the tariff cell (if no claims were made within the tariff cell, a fictious value of 0 is recorded).

(a)  [3 marks] Using R, determine the number of tariff cells and rating factors included in the dataset and specify the number of categories belonging to each rating factor.

(b)  [2 marks] Identify the tariff cell having the largest duration and set it to be your base tariff cell. Use this as your base tariff cell for both your frequency model and severity model.

(c)  [4 marks] Model frequency using a GLM based on the relative Poisson distribution and a log-link function.  Comment on the overall fit of the model using the results of a hypothesis test that involves deviance.

(d)  [5 marks] Propose a model that includes two simplifications to the model you found in (c). Show that you are statistically justified to make your simiplifications using the results of a likelihood ratio test.  (Removing a rating factor counts as one simplification, and combining two categories of a rating factor also counts as one simplification.)

(e)  [5 marks] Instead of using the model you found in (d), your manager would like to use the model you found in (c) but without the rating factor veh_body. Your manager is convinced this rating factor is not statistically significant to predict claim frequency and has asked whether you agree.

Answer their question using the results of a likelihood ratio test.

(f)  [3 + 3 + 3 = 9 marks] Using the model you found in part (d):

(i) Determine the estimate and 95% confidence interval of each multiplier.

(ii) Identify the tariff cell with the largest expected number of claims per year.  For a 2-year policy belonging to this tariff cell, determine the mean and variance of the number of claims.

(iii) Repeat (ii) but for the tariff cell with the smallest expected number of claims per year.

(g)  [4 marks] Model severity using a GLM based on the gamma distribution and a log-link function. Comment on the overall fit of the model using the results of a hypothesis test that involves deviance.

(h)  [6 marks] After your manager completes a comprehensive analysis of the severity model you found in part (g), they request that you remove the rating factors veh_body and veh_age.  Are we statistically justified to simplify the model in this way? Answer this question using the results of a likelihood ratio test.

(i)  [3 + 2 + 2 = 7 marks] Using the model you found in (h):

(a) Determine the estimate and 95% confidence interval of each multiplier.

(b) Identify the tariff cell with the largest expected claim amount.  State this expected claim amount.

(c) Repeat (ii) but for the tariff cell with the smallest expected claim amount.

(j)  [4 + 4 = 8 marks] Using the multipliers you found in part (f) and (i),

(i) Determine the estimate for the pure premium for a set of 80 independent 1-year policies of which:

• a quarter of the policies are in tariff cell having veh_body = 1, veh_age = 2, gender = M and agecat = B,

• the remaining are in tariff cell having veh_body = 1, veh_age = 1, gender = F and agecat = A.

(ii) For the same set of policies in (i), calculate the probability that this group of policyholders generates more than their expected number of claims in the next year.