Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

INFS3200 Advanced Database Systems

Prac 2: Data Warehousing (5%)

Semester 1, 2023

Due time: 4:00pm, Friday, 21 April 2023 (Week 8)

Submission: Submit your work online (INFS3200 Course Website)

1. Introduction

1.1 Learning objectives

Learn how to create a cube and its dependent components in Oracle OLAP. The tasks

include:

To identify and build the dimensions,

To define measures (both stored and calculated facts).

Learn how to map an OLAP model to source data and load data into the dimensions. Learn how to view an OLAP dataset. The tasks include:

To query through hierarchies of dimensions,

To perform DW operations such as roll-up, drill-down, and pivot.

Understand how data is organized and stored in dimensions and data cubes,

Learn how to query a data cube using SQL queries.

Understand how materialized views are used in cube query optimization.

1.2 Assessment

This Prac carries 5 marks for 5% assessment of four (4) tasks.

Marking Scheme:

1 mark: Complete the cube creation in Task 1.

1 mark: Maintain the cube successfully in Task 2.

2 marks: Complete Task 3, each bullet point is worth 1 mark.

1 mark: Complete Task 4.

You need to provide the screenshots of the data cubes you have created as well as the results of your OLAP operations. So the submission must include your screenshots, SQL queries and necessary descriptions in your submitted document. Please make sure your screenshots contain your student ID as the proof of originality.

Put all your content in a word/pdf document or leave scripts in separate files and pack all files into a zip/rar package. The file name should contain your name and student ID. Please format your document nicely to help tutor’s marking process. A poorly formatted document may receive a reduced mark. Submit your work to the Blackboard Course Website site by 4:00pm, Friday, 21 April 2023.

Late Penalties (from ECP):

Where an assessment item is submitted after the original deadline, without an approved extension, a late penalty will apply. The late penalty shall be 10% of the maximum possible markfor the assessment item will be deducted per calendar day (or part thereof) late, up to a maximum of seven (7) days. After seven days, no marks will be awardedfor the item. A day is a 24-hour blockfrom the assessment item due time. Negative marks will not be awarded.

2. Preparation: Data Warehouse Setup

2.1 System overview

Oracle OLAP: Oracle OLAP is a multidimensional analytic engine embedded in Oracle database system. It supports online analytics based on data warehousing techniques. In Oracle OLAP, a cube provides a convenient way of collecting stored and calculated measures with similar characteristics, including dimensionality, aggregation rules and so on. A particular analytic workspace (AW) may contain more than one cube, and each cube may describe a different  dimensional  shape.  Multiple  cubes  in  the  same  AW  may  share  one  or  more dimensions (Fact constellation, See Week 04 Lecture Notes).

Analytic Workspace Manager (AWM): In Oracle OLAP, AWM is an easy-to-use GUI tool for creating, developing, and managing multidimensional data in an Oracle data warehouse . Please refer to more online materials at: https://www.oracle.com/database/technologies/olap- 12101b-readme.html .

2.2 Description of the Data Warehouse

You  are  given  a  Data  Warehouse,  namely  OLAPTRAIN  which  has  a star schema. OLAPTRAIN was sourced from a transactional database system, which contains data for a fictional electronics store. The followings are the description of the tables (including four dimension tables and one fact table), intended for analysis that have gone through the ETL (Extraction, Transformation, and Loading of heterogeneous data) process:

Table

Description

CHANNELS

This table contains distribution channels for customerspurchases.

CUSTOMERS

This shows who purchased products, and where products were sold for Geographical dimensions of locations.

PRODUCTS

This table contains product categories sold by the company.

TIMES

This table contains time periods when products were sold.

SALES_FACT

This table stores purchases in dollars, quantity and unit price, by the channel of distribution, product item, day, and customer.

2.3 Data import

Download the P2.rar” package from Assessment on Blackboard course website and extract it to a local directory, for example “c:\app\P2\” . The given package contains the AWMsoftware, the OLAPTRAIN schema, cube templates and query scripts used in the following tasks.

Before analyzing the data, we need firstly import the OLAPTRAIN schema into the Oracle database. Open a Command Prompt by searching command prompt’ in the search window (Please run it as administrator to avoid privilege issues), shown as below, and complete the following steps:

/*Enter the directory that has the installation files, in my case it is “cd c:\app\P2\olaptrain_install”*/

> cd YOUR_P2_FOLDER\olaptrain_install

/*Login to SQL*Plus*/

> sqlplus sys as sysdba

Enter password: Password1!

/*Set a system parameter to avoid a future error when creating users*/

SQL> alter session set "_ORACLE_SCRIPT"=true;

/*Run the OLAPTRAIN installation script*/

SQL> @install_olaptrain_student

/*Enter the install directory and password*/

Directory: YOUR_P2_FOLDER\olaptrain_install

Password: w

StudentID: S1234567

Note that the directory should be the same as above and you can choose your own password  for user OLAPTRAIN_S1234567” (Please DO NOT include ‘-‘ or other special characters  in your password), which will be the main user throughout this practical. Please change          “S1234567” to your student ID and make sure “S” is upper case, this is crucial. The processes are shown below:

2.4 Troubleshooting:

A successful installation should end up with 6 errors, shown as follows:

However, if the installation ends quickly with the following messages, it is usually caused by specifying an incorrect directory.

2.5 Connect to data warehouse using AWM

Click the awm.bat” in awm122010_Standalone” folder to open the AWM (if there is no response after a click, please check if you have Java JDK installed in your system, Java 8 recommended). Create a new connection to the Oracle database and set the Description as your  student  ID  and  the Connection Information as localhost:1521:orcl’ .  Click  the connection and log in using the username OLAPTRAIN_S1234567” and the password you set.

Part 1: Create Logical Data Model

1. Understand the data

Before designing the data model, it is highly suggested to first understand the OLAPTRAIN schema. Use SQL developer to connect to the database and check the  following tables: channels, customers, products, times and sales_fact. The connection name should include your student ID, like DW_S1234567” .

After examining the OLAPTRAIN schema, we need to identify the dimensions, hierarchies and attributes of our data model. In addition, we define various measures based on business interests.

(1) Identifying Dimensions

Using the source data tables as the primary input, the following dimensions are identified as requirements for the OLAP data model:

Channel

Geography

Product

Time

Besides, each of the dimensions contains hierarchical structure, for example, in CHANNELS table, shown as follows, we can identify four hierarchies: the actual channels (channel_name/channel_key) -> channel type (channel_type) -> channel class

(class_name/class_key) -> ∅ (not chosen in group-by query, refer to Tutorial5-Q2, here we name it as all_channel).

(2) Identifying Measures

The measures are defined based on common business interests, each of which is equivalent to an SQL aggregation query. The measures include both stored and calculated measures. Stored measures are facts acquired from the fact table directly, while the calculated measures require complicated calculations over one or multiple facts. In this dataset, we focus on the following measures:

Stored Measures

Sales

Quantity

Calculated Measures

Sales Year-to-Date

Sales Year-to-Date Prior Year

The measures will be defined during the creation of cubes, which will be introduced later.


2. Create analytic workspace

Right-click Analytic Workspaces and select Create Analytic Workspace to create a new analytic workspace under the name of your student ID.

3. Create dimensions using templates

We provide templates for all four dimensions so that you do not need to define them manually. The template feature in Analytic Workspace Manager saves the definition of the OLAP data objects as an XML file. Using a saved template, you can create a new analytic workspace, dimension, cube, and measure exactly like an existing object, with or without mappings. Templates do not include the data, only the definition of the object.

In order to import a template, right-click the Dimensions folder, then select Create Dimension from Template. The templates are stored in the templates folder in the extracted folder.

Import Channel, Geography, Product and Time dimensions in the  same way. Check the settings of these dimensions and make sure all members are mapped to the data source (no modification needed). For example, in Channel dimension, the template contains the following settings:

Click the Dimensions folder and select Channel. In the General tab, we can see the dimension name (Channel) and the dimension type (User Dimension).


In the Levels tab, we can see three levels are defined, which correspond to three of the four possible dimensions mentioned above (channel type not included):

ALL_CHANNELS =

CLASS = channel class

CHANNEL = the actual channels

Under the Hierarchies in Channel dimension, we can see the SALES_CHANNEL hierarchy. These three levels are ordered in the following way, which is consistent with our understanding of the data:

Under the Attribute dialogue, we can see that channel type has been defined as an attribute attached to CHANNEL level, instead of being another hierarchy level, which means it is out of the business interest and cannot be queried on.

After defining a dimension, those defined components should also be mapped to the existing tables  and  views  in  Oracle  Database.  Specifically,  the Member attributes  in  the  OLAP dimension should be mapped to the key columns in the dimension tables, while the attribute columns should also be mapped to the appropriate OLAP dimension attributes.

Click the Mappings in Channel dimension and make sure the source columns (left) appears in the mapping pane (right)  correctly.  Note  that  there  is  no  such  source  data  column  for ALL_CHANNELS level, therefore, for "All/Total" hierarchy levels, the descriptions are typed manually:

Perform the same inspection to the other imported dimensions and understand how they are defined.

4. Create Cube

After all dimensions are defined, we are ready to create our cube. Right-click the Cubes folder, then click Create Cube. In the General tab of the Create Cube window, specify the following:

Name: SALES_CUBE

Use the Add button (>) to select dimensions in the following order:

o CHANNEL

o TIME

o GEOGRAPHY

o PRODUCT

Notes: The order in which the dimensions are listed in a cube may affect performance because it determines the way the data is stored on disk.

Next, select the Storage tab. The Storage tab helps you manage the data compression strategy. By default, we choose Use compression, and then enable the Sparse option for all dimensions, as shown below:

Finally, click Create to finish the dialogue.


5. Create measures

You can create two types of measures in a cube: Stored (or Base) measures, and Calculated measures. Every measure that belongs to a particular cube shares the characteristics that were defined for the cube.

(1) Stored Measures

Stored measures store the facts collected about your business. When you create stored measures in your OLAP data model, you will map them to the source data just like what you have done with dimensions.

(2) Calculated Measures

One of the powerful features of the Oracle OLAP technology is the ability to efficiently and easily generate business calculations of data held in the database. In any OLAP implementation, the number of calculated measures greatly exceeds the number of stored measures.

OLAP calculated measures are derived from stored measures or other calculated measures. These calculations are computed dynamically when users query the data. Calculations are automatically exposed as columns in a cube view, just like the facts.

According to our design, we define two measures, i.e., Sales and Quantity, and two calculated measures. Two measures are created as follows:

And we create the first calculated measures by entering or selecting the following:

Name = SALES_YTD (the Label and Description fields are auto-filled) Calculation Type = Period to Date

In the Calculation inputs section, select the following:

First hyperlink = Ancestor At Level

Second hyperlink = TIME.CALENDAR_YEAR

Finally, the window should look like this:


Similarly, we create a YTD calculation for the previous year. This measure facilitates year-to- year comparisons. The settings are shown below:

Name = SALES_YTD_PY

Calculation Type = Parallel Period

In the Calculation inputs section, select the following:

Second hyperlink = SALES_YTD

Fifth hyperlink = TIME.CALENDAR_YEAR

6. Map the cube

Same as the dimension mapping, we need to map our cube to the existing data source. In a data cube, we need to map the following fields:

The stored measures that are defined within the cube.

The lowest level of detail for each dimension hierarchy.

The Join Condition field. This field associates the foreign key column from the fact

table to the primary key column from the dimension table.

Note that the mapping is done by dragging the corresponding column from the source columns (left) to the correct spot in the mapping pane (right). A Join condition can be achieved by dragging both joining columns to that slot and the “=” will be added automatically. Please DO NOT type those values manually as it will cause unexpected problems in the future. Eventually, the mapping result should look like the follows. Click Apply to complete the mapping.

Task 1: Complete all steps introduced above until you successfully create and map the cube. Take three screenshots of the cube you have created, including the mapping, the definition of two calculated measures. Include those screenshots in your document submission.

NOTE: Please make sure your student ID appears in every screenshot you take to show the originality of your work. This rule applies to ALL screenshots taken throughout this prac    unless specified. Screenshots without student ID may be regarded as invalid and receive       reduced marks. An example screenshot of the mapping should look like the follows:



Part 2: Load and View Cube Data

The Maintenance Wizard loads and aggregates the data in a single step. We can load all mapped objects in the analytic workspace, or individual dimensions and measures. We can also choose to run the job immediately, enter it in the Oracle Job Queue, or save it as a SQL script. The materialized views, calculated measures are preprocessed during this maintenance phase. After the maintenance, we can use SQL queries to retrieve the result. Also, we can view the cube data using operators like Roll-up, Drill-down and Pivot.

1. Maintain the cube

The cube maintenance will load data to the cube. By default, the dimensions of that cube are also processed before the cube. If you have already loaded dimension data, you can specify only to load measure data.

Task 2: We start to maintain the cube. It will take quite a while to finish the maintenance. Please take a screenshot of your maintenance result, same as below, and include it in your document.

2. View the cube data


Click the View Data SALES_CUBE to start the Data Viewer.


Task 3: Now you can play with this tool to explore the data. Meanwhile, you are required to achieve a few goals:

Perform  roll-up,  drill-down  and  pivot  operations  in  Data  Viewer,  respectively

(Lecture5-P36, Tutorial4-Q2). Each operation includes three parts: (1) the screenshot of the view before the operation, (2) a sentence describing which operation you are performing on which columns/levels; (3) the screenshot of the view after the operation. Therefore, there are 6  screenshots and 3  lines  of description in total. Include the screenshots and descriptions in your document.

Adjust your Data Viewer window until it is identical (the value can be different) to the

following two views (Hint: your viewer windows must be identical to get full mark and you may need help from the Query Builder, shown below). Take a screenshotfor each view you made and put them in your document.



Part 3: Understand Data Warehouse Design Mechanism

We have learnt how to create dimensions and cubes using Oracle OLAP and AWM. Now, we will try to understand the mechanism behind. Specifically, there is NO mark in this part, but it will help you understand the whole process of how data cubes are built, queried and managed by Oracle OLAP, which will make the subsequent tasks easier to follow. Therefore, this part is optional but recommended. Please follow the instructions below.

1. View the dimension tables and fact table

Connect to the olaptrain schema via SQL Developer using your OLAPTRAIN_S1234567” (Once  again,  S1234567  refers  to  your  student  number)  user,  and  then  check  the  tables “CHANNELS” and “SALES_FACT” .

Just like what we have learnt from the lecture and tutorial, the dimension table “CHANNELS” contains dimension key (Channel_Key), hierarchy (Channel –> Class -> All Channels) and attribute