INFS3200 Advanced Database Systems Prac 2
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
INFS3200 Advanced Database Systems
Prac 2: Data Warehousing (5%)
Semester 1, 2023
Due time: 4:00pm, Friday, 21 April 2023 (Week 8)
Submission: Submit your work online (INFS3200 Course Website)
1. Introduction
1.1 Learning objectives
Learn how to create a cube and its dependent components in Oracle OLAP. The tasks
include:
To identify and build the dimensions,
To define measures (both stored and calculated facts).
Learn how to map an OLAP model to source data and load data into the dimensions. Learn how to view an OLAP dataset. The tasks include:
To query through hierarchies of dimensions,
To perform DW operations such as roll-up, drill-down, and pivot.
Understand how data is organized and stored in dimensions and data cubes,
Learn how to query a data cube using SQL queries.
Understand how materialized views are used in cube query optimization.
1.2 Assessment
This Prac carries 5 marks for 5% assessment of four (4) tasks.
Marking Scheme:
1 mark: Complete the cube creation in Task 1.
1 mark: Maintain the cube successfully in Task 2.
2 marks: Complete Task 3, each bullet point is worth 1 mark.
1 mark: Complete Task 4.
You need to provide the screenshots of the data cubes you have created as well as the results of your OLAP operations. So the submission must include your screenshots, SQL queries and necessary descriptions in your submitted document. Please make sure your screenshots contain your student ID as the proof of originality.
Put all your content in a word/pdf document or leave scripts in separate files and pack all files into a zip/rar package. The file name should contain your name and student ID. Please format your document nicely to help tutor’s marking process. A poorly formatted document may receive a reduced mark. Submit your work to the Blackboard Course Website site by 4:00pm, Friday, 21 April 2023.
Late Penalties (from ECP):
“Where an assessment item is submitted after the original deadline, without an approved extension, a late penalty will apply. The late penalty shall be 10% of the maximum possible markfor the assessment item will be deducted per calendar day (or part thereof) late, up to a maximum of seven (7) days. After seven days, no marks will be awardedfor the item. A day is a 24-hour blockfrom the assessment item due time. Negative marks will not be awarded.
2. Preparation: Data Warehouse Setup
2.1 System overview
Oracle OLAP: Oracle OLAP is a multidimensional analytic engine embedded in Oracle database system. It supports online analytics based on data warehousing techniques. In Oracle OLAP, a cube provides a convenient way of collecting stored and calculated measures with similar characteristics, including dimensionality, aggregation rules and so on. A particular analytic workspace (AW) may contain more than one cube, and each cube may describe a different dimensional shape. Multiple cubes in the same AW may share one or more dimensions (Fact constellation, See Week 04 Lecture Notes).
Analytic Workspace Manager (AWM): In Oracle OLAP, AWM is an easy-to-use GUI tool for creating, developing, and managing multidimensional data in an Oracle data warehouse . Please refer to more online materials at: https://www.oracle.com/database/technologies/olap- 12101b-readme.html .
2.2 Description of the Data Warehouse
You are given a Data Warehouse, namely OLAPTRAIN which has a star schema. OLAPTRAIN was sourced from a transactional database system, which contains data for a fictional electronics store. The followings are the description of the tables (including four dimension tables and one fact table), intended for analysis that have gone through the ETL (Extraction, Transformation, and Loading of heterogeneous data) process:
Table |
Description |
CHANNELS |
This table contains distribution channels for customers’ purchases. |
CUSTOMERS |
This shows who purchased products, and where products were sold for Geographical dimensions of locations. |
PRODUCTS |
This table contains product categories sold by the company. |
TIMES |
This table contains time periods when products were sold. |
SALES_FACT |
This table stores purchases in dollars, quantity and unit price, by the channel of distribution, product item, day, and customer. |
2.3 Data import
Download the “P2.rar” package from Assessment on Blackboard course website and extract it to a local directory, for example “c:\app\P2\” . The given package contains the AWMsoftware, the OLAPTRAIN schema, cube templates and query scripts used in the following tasks.
Before analyzing the data, we need firstly import the OLAPTRAIN schema into the Oracle database. Open a Command Prompt by searching ‘command prompt’ in the search window (Please run it as administrator to avoid privilege issues), shown as below, and complete the following steps:
/*Enter the directory that has the installation files, in my case it is “cd c:\app\P2\olaptrain_install”*/
> cd YOUR_P2_FOLDER\olaptrain_install
/*Login to SQL*Plus*/
> sqlplus sys as sysdba
Enter password: Password1!
/*Set a system parameter to avoid a future error when creating users*/
SQL> alter session set "_ORACLE_SCRIPT"=true;
/*Run the OLAPTRAIN installation script*/
SQL> @install_olaptrain_student
/*Enter the install directory and password*/
Directory: YOUR_P2_FOLDER\olaptrain_install
Password: w
StudentID: S1234567
Note that the directory should be the same as above and you can choose your own password for user “OLAPTRAIN_S1234567” (Please DO NOT include ‘-‘ or other special characters in your password), which will be the main user throughout this practical. Please change “S1234567” to your student ID and make sure “S” is upper case, this is crucial. The processes are shown below:
2.4 Troubleshooting:
A successful installation should end up with 6 errors, shown as follows:
However, if the installation ends quickly with the following messages, it is usually caused by specifying an incorrect directory.
2.5 Connect to data warehouse using AWM
Click the “awm.bat” in “awm122010_Standalone” folder to open the AWM (if there is no response after a click, please check if you have Java JDK installed in your system, Java 8 recommended). Create a new connection to the Oracle database and set the Description as your student ID and the Connection Information as ‘localhost:1521:orcl’ . Click the connection and log in using the username “OLAPTRAIN_S1234567” and the password you set.
Part 1: Create Logical Data Model
1. Understand the data
Before designing the data model, it is highly suggested to first understand the OLAPTRAIN schema. Use SQL developer to connect to the database and check the following tables: channels, customers, products, times and sales_fact. The connection name should include your student ID, like “DW_S1234567” .
After examining the OLAPTRAIN schema, we need to identify the dimensions, hierarchies and attributes of our data model. In addition, we define various measures based on business interests.
(1) Identifying Dimensions
Using the source data tables as the primary input, the following dimensions are identified as requirements for the OLAP data model:
Channel
Geography
Product
Time
Besides, each of the dimensions contains hierarchical structure, for example, in CHANNELS table, shown as follows, we can identify four hierarchies: the actual channels (channel_name/channel_key) -> channel type (channel_type) -> channel class
(class_name/class_key) -> ∅ (not chosen in group-by query, refer to Tutorial5-Q2, here we name it as all_channel).
(2) Identifying Measures
The measures are defined based on common business interests, each of which is equivalent to an SQL aggregation query. The measures include both stored and calculated measures. Stored measures are facts acquired from the fact table directly, while the calculated measures require complicated calculations over one or multiple facts. In this dataset, we focus on the following measures:
Stored Measures
Sales
Quantity
Calculated Measures
Sales Year-to-Date
Sales Year-to-Date Prior Year
The measures will be defined during the creation of cubes, which will be introduced later.
2. Create analytic workspace
Right-click Analytic Workspaces and select Create Analytic Workspace to create a new analytic workspace under the name of your student ID.
3. Create dimensions using templates
We provide templates for all four dimensions so that you do not need to define them manually. The template feature in Analytic Workspace Manager saves the definition of the OLAP data objects as an XML file. Using a saved template, you can create a new analytic workspace, dimension, cube, and measure exactly like an existing object, with or without mappings. Templates do not include the data, only the definition of the object.
In order to import a template, right-click the Dimensions folder, then select Create Dimension from Template. The templates are stored in the templates folder in the extracted folder.
Import Channel, Geography, Product and Time dimensions in the same way. Check the settings of these dimensions and make sure all members are mapped to the data source (no modification needed). For example, in Channel dimension, the template contains the following settings:
Click the Dimensions folder and select Channel. In the General tab, we can see the dimension name (Channel) and the dimension type (User Dimension).
In the Levels tab, we can see three levels are defined, which correspond to three of the four possible dimensions mentioned above (channel type not included):
ALL_CHANNELS = ∅
CLASS = channel class
CHANNEL = the actual channels
Under the Hierarchies in Channel dimension, we can see the SALES_CHANNEL hierarchy. These three levels are ordered in the following way, which is consistent with our understanding of the data:
Under the Attribute dialogue, we can see that channel type has been defined as an attribute attached to CHANNEL level, instead of being another hierarchy level, which means it is out of the business interest and cannot be queried on.
After defining a dimension, those defined components should also be mapped to the existing tables and views in Oracle Database. Specifically, the Member attributes in the OLAP dimension should be mapped to the key columns in the dimension tables, while the attribute columns should also be mapped to the appropriate OLAP dimension attributes.
Click the Mappings in Channel dimension and make sure the source columns (left) appears in the mapping pane (right) correctly. Note that there is no such source data column for ALL_CHANNELS level, therefore, for "All/Total" hierarchy levels, the descriptions are typed manually:
Perform the same inspection to the other imported dimensions and understand how they are defined.
4. Create Cube
After all dimensions are defined, we are ready to create our cube. Right-click the Cubes folder, then click Create Cube. In the General tab of the Create Cube window, specify the following:
Name: SALES_CUBE
Use the Add button (>) to select dimensions in the following order:
o CHANNEL
o TIME
o GEOGRAPHY
o PRODUCT
Notes: The order in which the dimensions are listed in a cube may affect performance because it determines the way the data is stored on disk.
Next, select the Storage tab. The Storage tab helps you manage the data compression strategy. By default, we choose Use compression, and then enable the Sparse option for all dimensions, as shown below:
Finally, click Create to finish the dialogue.
5. Create measures
You can create two types of measures in a cube: Stored (or Base) measures, and Calculated measures. Every measure that belongs to a particular cube shares the characteristics that were defined for the cube.
(1) Stored Measures
Stored measures store the facts collected about your business. When you create stored measures in your OLAP data model, you will map them to the source data just like what you have done with dimensions.
(2) Calculated Measures
One of the powerful features of the Oracle OLAP technology is the ability to efficiently and easily generate business calculations of data held in the database. In any OLAP implementation, the number of calculated measures greatly exceeds the number of stored measures.
OLAP calculated measures are derived from stored measures or other calculated measures. These calculations are computed dynamically when users query the data. Calculations are automatically exposed as columns in a cube view, just like the facts.
According to our design, we define two measures, i.e., Sales and Quantity, and two calculated measures. Two measures are created as follows:
And we create the first calculated measures by entering or selecting the following:
Name = SALES_YTD (the Label and Description fields are auto-filled) Calculation Type = Period to Date
In the Calculation inputs section, select the following:
First hyperlink = Ancestor At Level
Second hyperlink = TIME.CALENDAR_YEAR
Finally, the window should look like this:
Similarly, we create a YTD calculation for the previous year. This measure facilitates year-to- year comparisons. The settings are shown below:
Name = SALES_YTD_PY
Calculation Type = Parallel Period
In the Calculation inputs section, select the following:
Second hyperlink = SALES_YTD
Fifth hyperlink = TIME.CALENDAR_YEAR
6. Map the cube
Same as the dimension mapping, we need to map our cube to the existing data source. In a data cube, we need to map the following fields:
The stored measures that are defined within the cube.
The lowest level of detail for each dimension hierarchy.
The Join Condition field. This field associates the foreign key column from the fact
table to the primary key column from the dimension table.
Note that the mapping is done by dragging the corresponding column from the source columns (left) to the correct spot in the mapping pane (right). A Join condition can be achieved by dragging both joining columns to that slot and the “=” will be added automatically. Please DO NOT type those values manually as it will cause unexpected problems in the future. Eventually, the mapping result should look like the follows. Click Apply to complete the mapping.
Task 1: Complete all steps introduced above until you successfully create and map the cube. Take three screenshots of the cube you have created, including the mapping, the definition of two calculated measures. Include those screenshots in your document submission.
NOTE: Please make sure your student ID appears in every screenshot you take to show the originality of your work. This rule applies to ALL screenshots taken throughout this prac unless specified. Screenshots without student ID may be regarded as invalid and receive reduced marks. An example screenshot of the mapping should look like the follows:
Part 2: Load and View Cube Data
The Maintenance Wizard loads and aggregates the data in a single step. We can load all mapped objects in the analytic workspace, or individual dimensions and measures. We can also choose to run the job immediately, enter it in the Oracle Job Queue, or save it as a SQL script. The materialized views, calculated measures are preprocessed during this maintenance phase. After the maintenance, we can use SQL queries to retrieve the result. Also, we can view the cube data using operators like Roll-up, Drill-down and Pivot.
1. Maintain the cube
The cube maintenance will load data to the cube. By default, the dimensions of that cube are also processed before the cube. If you have already loaded dimension data, you can specify only to load measure data.
Task 2: We start to maintain the cube. It will take quite a while to finish the maintenance. Please take a screenshot of your maintenance result, same as below, and include it in your document.
2. View the cube data
Click the View Data SALES_CUBE to start the Data Viewer.
Task 3: Now you can play with this tool to explore the data. Meanwhile, you are required to achieve a few goals:
Perform roll-up, drill-down and pivot operations in Data Viewer, respectively
(Lecture5-P36, Tutorial4-Q2). Each operation includes three parts: (1) the screenshot of the view before the operation, (2) a sentence describing which operation you are performing on which columns/levels; (3) the screenshot of the view after the operation. Therefore, there are 6 screenshots and 3 lines of description in total. Include the screenshots and descriptions in your document.
Adjust your Data Viewer window until it is identical (the value can be different) to the
following two views (Hint: your viewer windows must be identical to get full mark and you may need help from the Query Builder, shown below). Take a screenshotfor each view you made and put them in your document.
Part 3: Understand Data Warehouse Design Mechanism
We have learnt how to create dimensions and cubes using Oracle OLAP and AWM. Now, we will try to understand the mechanism behind. Specifically, there is NO mark in this part, but it will help you understand the whole process of how data cubes are built, queried and managed by Oracle OLAP, which will make the subsequent tasks easier to follow. Therefore, this part is optional but recommended. Please follow the instructions below.
1. View the dimension tables and fact table
Connect to the olaptrain schema via SQL Developer using your “OLAPTRAIN_S1234567” (Once again, S1234567 refers to your student number) user, and then check the tables “CHANNELS” and “SALES_FACT” .
Just like what we have learnt from the lecture and tutorial, the dimension table “CHANNELS” contains dimension key (Channel_Key), hierarchy (Channel –> Class -> All Channels) and attribute
2023-04-24
Data Warehousing