Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MSCA31010: Linear & Non-Linear Models

Winter 2022 Assignment 4

The Homeowner_Claim_History.xlsx contains the claim history of 27,513 homeowner policies. The following table describes the eleven columns in the HOCLAIMDATA sheet.

Name	Description	Categories
policy	Policy Identifier
exposure	Duration a Policy is Exposed to Risk Measured in Portion of a Year
num_claims	Number of Claims in a Year
amt_claims	Total Claim Amount in a Year
f_primary_age_tier	Age Tier of Primary Insured	< 21, 21 - 27, 28 - 37, 38 - 60, > 60
f_primary_gender	Gender of Primary Insured	Female, Male
f_marital	Marital Status of Primary Insured	Not Married, Married, Un-Married
f_residence_location	Location of Residence Property	Urban, Suburban, Rural
f_fire_alarm_type	Fire Alarm Type	None, Standalone, Alarm Service
f_mile_fire_station	Distance to Nearest Fire Station	< 1 mile, 1 - 5 miles, 6 - 10 miles, > 10 miles
f_aoi_tier	Amount of Insurance Tier	< 100K, 100K - 350K, 351K - 600K, 601K - 1M, > 1M

In insurance ratemaking, the ratio of Total Claim Amount in a Year divided by the Number of Claims in a Year is called the Severity. In other words, Severity is the average dollar amount per claim. If a policy does not file any claims in a year, then its Severity is missing.

Unless otherwise stated, please provide all numeric answers rounded to the seventh decimal place.

Question 1 (50 points)

(a) (10 points) Generate horizontal boxplots of Total Claim Amount in a Year grouped by each of the seven categorical predictors f_primary_age_tier, f_primary_gender, f_marital, f_residence_location, f_fire_alarm_type, f_mile_fire_station, and f_aoi_tier.

(b) (10 points) For analyses, Severity will follow a Gamma distribution. Train a Gamma model with the logarithm link function. The target variable is Severity (use only positive and non-missing values for analyses). The predictors are the seven categorical predictors. The model will include the Intercept term. Enter predictors into the model using the Forward Selection method. The entry threshold is 0.05. What is the estimate for the Shape parameter?

(c) (10 points) Provide the Step Summary table. The table should contain (1) Step Number, (2) Model Degrees of Freedom, (3) Model Log-Likelihood, (4) Deviance Chi-Squares, (5) Deviance Degrees of Freedom, and (6) Deviance Significance. Show the Significance in .E7 scientific notation.

(d) (10 points) Assess the final model goodness-of-fit using (1) Root Mean Squared Error, (2) Relative Error, (3) Mean Absolute Proportion Error, and (4) Pearson Correlation. What are the values of these metrics?

(e) (10 points) Identify any poorly predicted observations. First, plot the predicted versus the observed Severity. Second, together in a single chart frame, plot the Simple Residuals, the Pearson Residuals, the Deviance Residuals, and the Absolute Proportion Errors versus the observed Severity. Label the axes of these two charts accordingly. To receive full credits, generate your charts with proper dimensions (e.g., length and width) and resolution (e.g., dpi).

Question 2 (50 points)

(a) (20 points). Train a Multi-Layer Perceptron neural network. The target variable is Severity (use only positive and non-missing values for analyses). The predictors are the seven categorical predictors. Perform a naïve grid search to select the best network structure. For each Hyperbolic Tangent and Rectified Linear Unit activation function, try the number of layers from 1 to 10, the common number of neurons per layer from 1 to 5. Provide a table that shows your grid search results. The table should contain (1) the activation function type, (2) the number of layers, (3) the common number of neurons per layer, (4) the total number of neurons, and (5) the mean absolute proportion error.

(b) (10 points) Recommend the best network structure which yields the lowest Mean Absolute Proportion Error. In the case of ties, choose the network with a fewer total number of neurons.

(c) (10 points) Assess the final model goodness-of-fit using (1) Root Mean Squared Error, (2) Relative Error, (3) Mean Absolute Proportion Error, and (4) Pearson Correlation. What are the values of these metrics?

(d) (10 points) Identify any poorly predicted observations. First, plot the predicted versus the observed Severity. Second, together in a single chart frame, plot the Simple Residuals, the Pearson Residuals, and the Absolute Proportion Errors versus the observed Severity. Label the axes of these two charts accordingly. To receive full credits, generate your charts with proper dimensions (e.g., length and width) and resolution (e.g., dpi).