Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MSCA31010: Linear & Non-Linear Models

Winter 2022 Assignment 4

The Homeowner_Claim_History.xlsx contains the claim history of 27,513 homeowner policies.  The following table describes the eleven columns in the HOCLAIMDATA sheet.

Name

Description

Categories

policy

Policy Identifier

 

exposure

Duration a Policy is Exposed to Risk Measured in Portion of a Year

 

num_claims

Number of Claims in a Year

 

amt_claims

Total Claim Amount in a Year

 

f_primary_age_tier

Age Tier of Primary Insured

< 21, 21 - 27, 28 - 37, 38 - 60, > 60

f_primary_gender

Gender of Primary Insured

Female, Male

f_marital

Marital Status of Primary Insured

Not Married, Married, Un-Married

f_residence_location

Location of Residence Property

Urban, Suburban, Rural

f_fire_alarm_type

Fire Alarm Type

None, Standalone, Alarm Service

f_mile_fire_station

Distance to Nearest Fire Station

< 1 mile, 1 - 5 miles, 6 - 10 miles,
> 10 miles

f_aoi_tier

Amount of Insurance Tier

< 100K, 100K - 350K, 351K - 600K, 601K - 1M, > 1M

 

In insurance ratemaking, the ratio of Total Claim Amount in a Year divided by the Number of Claims in a Year is called the Severity.  In other words, Severity is the average dollar amount per claim.  If a policy does not file any claims in a year, then its Severity is missing.

Unless otherwise stated, please provide all numeric answers rounded to the seventh decimal place.


Question 1 (50 points)

(a) (10 points) Generate horizontal boxplots of Total Claim Amount in a Year grouped by each of the seven categorical predictors f_primary_age_tier, f_primary_gender, f_marital, f_residence_location, f_fire_alarm_type, f_mile_fire_station, and f_aoi_tier.

(b) (10 points) For analyses, Severity will follow a Gamma distribution. Train a Gamma model with the logarithm link function.  The target variable is Severity (use only positive and non-missing values for analyses).  The predictors are the seven categorical predictors.  The model will include the Intercept term.  Enter predictors into the model using the Forward Selection method.  The entry threshold is 0.05.  What is the estimate for the Shape parameter?

(c) (10 points) Provide the Step Summary table.  The table should contain (1) Step Number, (2) Model Degrees of Freedom, (3) Model Log-Likelihood, (4) Deviance Chi-Squares, (5) Deviance Degrees of Freedom, and (6) Deviance Significance.  Show the Significance in .E7 scientific notation.  

(d) (10 points) Assess the final model goodness-of-fit using (1) Root Mean Squared Error, (2) Relative Error, (3) Mean Absolute Proportion Error, and (4) Pearson Correlation.  What are the values of these metrics?

(e) (10 points) Identify any poorly predicted observations.  First, plot the predicted versus the observed Severity.  Second, together in a single chart frame, plot the Simple Residuals, the Pearson Residuals, the Deviance Residuals, and the Absolute Proportion Errors versus the observed Severity. Label the axes of these two charts accordingly.  To receive full credits, generate your charts with proper dimensions (e.g., length and width) and resolution (e.g., dpi).


Question 2 (50 points)

(a) (20 points).  Train a Multi-Layer Perceptron neural network.  The target variable is Severity (use only positive and non-missing values for analyses).  The predictors are the seven categorical predictors.  Perform a naïve grid search to select the best network structure.  For each Hyperbolic Tangent and Rectified Linear Unit activation function, try the number of layers from 1 to 10, the common number of neurons per layer from 1 to 5.  Provide a table that shows your grid search results.  The table should contain (1) the activation function type, (2) the number of layers, (3) the common number of neurons per layer, (4) the total number of neurons, and (5) the mean absolute proportion error.

(b) (10 points) Recommend the best network structure which yields the lowest Mean Absolute Proportion Error.  In the case of ties, choose the network with a fewer total number of neurons.

(c) (10 points) Assess the final model goodness-of-fit using (1) Root Mean Squared Error, (2) Relative Error, (3) Mean Absolute Proportion Error, and (4) Pearson Correlation.  What are the values of these metrics?

(d) (10 points) Identify any poorly predicted observations.  First, plot the predicted versus the observed Severity.  Second, together in a single chart frame, plot the Simple Residuals, the Pearson Residuals, and the Absolute Proportion Errors versus the observed Severity. Label the axes of these two charts accordingly.  To receive full credits, generate your charts with proper dimensions (e.g., length and width) and resolution (e.g., dpi).