Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit


Assignment 3 - Data Warehouse Modeling

The objective of this assignment is to learn how to design a data warehouse for risk analysis of auto insurance.   

Total Marks: 100 points

Mark distribution:

1

Obtain domain knowledge

10 points

2

Build star schemar

40 points

3

Sample data

20 points

4

Identify data sourse

10 points

5

Possible analytical use

10 points

6

Report on learning experience

10 points

To do this assignment, you need to learn basic knowledge about auto insurance. You can visit web sites such as

Insurance Bureau of Canada (IBC)  http://www.ibc.ca/on/auto/buying-auto-insurance, Ratehub https://www.ratehub.ca/insurance/best-car-insurance-quote,  and Desjardins Insurance https://www.desjardinsagents.com/insurance/auto/coverage-options,).  Insurance means that insurance companies take over risks from customers by collecting premiums from customers. Risk management is very important for the insurance industry. Insurers consider many available quantifiable factors in developing profiles of high and low insurance risks. The possibility and consequence of risks determine insurance premiums. Generally, insurance policies involving factors with greater risk of claims are charged at a higher rate. With much information at hand, insurers can evaluate risk of insurance policies at much higher accuracy. To this end, insurers collect a vast amount of information about policyholders and insured objects in order to determine the insurance premium.  To facilitate business operation, an insurance company create, store, and manipulate information in its database including driver information (such as gender, age, education, occupation, license, driving experience, previous driving records), vehicle information (such as vehicle type, manufacturer, model, year of make), insurance policy information (such as coverage and premium), accident information (such as date, location, type of accident, cause of accident), and claim information (such as vehicle repair cost,  medical expenses for injuries, property damage liability, etc.). Once these data stored in operational database are extracted and transferred to a data warehouse, statistical methods and data mining techniques can be used to analyze insurance policy risks and costs (see https://otlablog.com/fsco-rate-filings-profit-transparency-lacking/ ).

You are asked to build a data warehouse to facilitate auto insurance risk analysis. To do so, you need to perform the following tasks:

1. Obtain domain knowledge (10 poimnts). To obtain domain knowledge on auto insurance through literature review. You may also learn from insurance companies through their web site or personal contact. Report what were the sources you have used for information gathering and what you have learned on the factors that may affect the risk of auto insurance. Justify why a data warehouse is necessary to collect and store analytical data for risk analysis of auto insurance.

2. Build star schema (40 points). The dataware house data is attracted and transformed from the operationl database. Reference (but not limiterd to) the following relational schema for auto-insurance operational database.  You need to design a star schema for the data warehouse that can facilitate auto insurance risk analysis. Using ERDPlus to draw a star schema,  identify the facts, the dimensions, the attributes, and the attribute hierarchy. Copy the star schema drawing into your report. Explain the meaning and reason of your design.  For instance, what  measurement is used in your fact table, why you include paticual dimensions and attributes in your star schema

 

3. Sample data (20 points). For the fact table and dimension tables you have designed, provide sample data to illustrate the content of each table. To get some idea of attributes and possible values you may reference the sample forms attached in this assignment such as the insurance quote request form, the insurance policy form and the incident report form. You may also collect sample views by yourself.  

 

 

4. Data source of data warehose (10 points). Identify the internal and external sources of data will be used to load into your data warehouse and discuss the challenges for the building of the data warehouse. For instance, the internal source could be the insurance company’s insurance processing system, the external source could be government police office system etc.

5. Possible analytical use (10 points)  Suggest and discuss possible risk analysis questions using this data warehouse for an auto insurance company. For instance, anaylizing insurance claim cost for different types of damage such as collision or injury

6. Learning report (10 points) Report how many hours you spent on this assignment and what you have learned from doing this assignment. If you used Generative AI to assist you to do the assignment, explain how did you use it, and what are the results generated by using AI. If you have two people work together, report how you collaborated and how much time each of you spent. Put both of your name and student ID on the report.

Note:

1. The assignment can be done individually or collaboratively in a team with only two members.

2. For teamwork, each member is expected to contribute equally and will receive the same mark. The assignment only needs to be submitted by one of the team members. A front-page should be used to indicate the assignment number, the team member names and student IDs.

3. The report should be saved in Microsoft word file. The file name should be ASS3+section number + your mac id. For instance, if you are in section C1 and your mac id is stevej, then your file name will be ASS3C1stevej

4. Please submit your assignment to the dropbox of Avenue. Assignment should be submitted before the due date indicated in Avenue.

Assignment 3 – Data Warehouse Modeling Rubric (Total: 100 points)

1. Obtain Domain Knowledge (10 points)

· Excellent (9–10 pts):
Comprehensive review of auto insurance domain; multiple credible sources cited; clear explanation of risk factors; strong justification for why a data warehouse is necessary.

· Good (7–8 pts):
Adequate review with some sources; identifies key risk factors; justification is reasonable but lacks depth.

· Fair (5–6 pts):
Limited sources; basic discussion of risk factors; justification is weak or incomplete.

· Poor (0–4 pts):
Minimal or no domain knowledge; missing justification for data warehouse.


2. Build Star Schema (40 points)

· Excellent (36–40 pts):
Well-designed star schema using ERDPlus; clear identification of fact table, dimensions, attributes, and hierarchies; diagram included; strong explanation of design choices and measurements.

· Good (28–35 pts):
Schema mostly correct; minor issues in dimensions or hierarchies; diagram provided; explanation adequate.

· Fair (20–27 pts):
Basic schema with noticeable errors; incomplete diagram or unclear reasoning.

· Poor (0–19 pts):
Incorrect or missing schema; no diagram or explanation.


3. Sample Data (20 points)

· Excellent (18–20 pts):
Complete sample data for fact and dimension tables; realistic values; well-formatted tables.

· Good (14–17 pts):
Sample data provided for most tables; values reasonable but limited variety.

· Fair (10–13 pts):
Minimal sample data; lacks clarity or realism.

· Poor (0–9 pts):
Missing or irrelevant sample data.


4. Identify Data Sources (10 points)

· Excellent (9–10 pts):
Clearly identifies internal and external sources; discusses challenges in detail.

· Good (7–8 pts):
Sources identified; challenges mentioned but not fully explored.

· Fair (5–6 pts):
Limited discussion of sources; challenges vague.

· Poor (0–4 pts):
Missing or incorrect sources; no discussion of challenges.


5. Possible Analytical Use (10 points)

· Excellent (9–10 pts):
Multiple relevant analytical questions; clear link to risk analysis; demonstrates understanding of business value.

· Good (7–8 pts):
Some relevant questions; connection to risk analysis is adequate.

· Fair (5–6 pts):
Few or generic questions; weak link to risk analysis.

· Poor (0–4 pts):
Missing or irrelevant analytical uses.


6. Learning Report (10 points)

· Excellent (9–10 pts):
Detailed reflection on time spent, learning outcomes, and AI usage (if applicable); clear collaboration report for team work.

· Good (7–8 pts):
Reflection provided; some details missing or vague.

· Fair (5–6 pts):
Minimal reflection; lacks specifics on time or learning.

· Poor (0–4 pts):
Missing or incomplete report.