Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

CSI4142 Fundamentals of Data Science

Midterm Examination 2020

1.    Identify one (1) measure.                                                                                                                 (2)

The cost per individual, i.e. $ per individual. (This will be different for adults and minors.) 2 marks (or)

A measure (adult (y/n)) is also OK. 2 marks

2.    Identify one (1) role-playing dimension.                                                                                         (2)

Date: start date, end date, purchase date 2 marks

(or)

City/Location: origin and destination 2 marks

3.    Identify one concept hierarchy, other than the one in the Date dimension.                          (2)

Resort City Country

4.    Provide the attributes you would include in the Visitor dimension of your conceptual design and explain how you would incorporate demographic details into this design.                            (4)

This question tests insight, since there is an interplay between visitors and members. Visitor (Visitor-key, Last Name, First Name, Age, Gender, Nationality, …) 2 marks

Next, we have additional information for the persons who purchased the tickets. For these, we can add occupation, home address, postal code, etc., lifestyle preferences, and so on.

If we have the postal codes, we can use (or buy) demographic data as obtained from sources such as

Statistics Canada.

Otherwise, these fields will be left NULL.

Some of the aspects noted above 2 marks

5.    Provide the SQL statement to create the Fact table.                                                                       (6)

Purchase-Date-Key, Trip-Start-Date-Key, Trip-End-Date-Key: 3 dates are role-playing 1 mark if some date Visitor-Key: all visitor details (include demographics) 1 mark

Member-key: we may want to model members separately ok if not, but student needs to realise the difficulty

Destination-Key: details about the resort 1 mark

Origin-Key: the city where the trip starts (note one may live in Ottawa, but fly from Montreal) 1 mark Airplane-Key: details about the flight (in and out) 0.5 mark

Transport-Key: details about the ground transport (in and out) 0.5 mark

$Price: measure per individual 0 marks, since already asked above

6.    Handling missing values.                                                                                                                 (4)

This question tests insight. Here, we are looking at how to best handle missing values. We could e.g. impute values, use the average per age group or so on, or leave it blank.

We do have the postal code of all customers, so the best way would be to use demographic data based on the census. In this way, we can get an estimate for this value. An alternative is also to look at property prices, since we do have the person’s address.