Assessment 3 - Business Analytics Research Project


Assessment 3 - Business Analytics Research Project (Due on 3rd Nov 2023, 11.30pm)

Part 1: Data Mining, Machine Learning and Text Analytics (50%)

If a solution is too difficult to describe, provide screenshot(s) to demonstrate your solution.

Exercise 1: Data Mining and Machine Learning (25%)

You work with the PVA_PARTITION data set in this Exercise. It contains data that represent

charitable donations made to a veterans’ organization. The data represent the results of a mail campaign to solicit donations. Solicitations involve sending a small gift to an individual and

include a request for a donation. The data set contains the following information:

.  a flag to indicate respondents to the appeal (Target Gift Flag) and the dollar amount of their donations (Target Gift Amount)

.  respondents’ PVA promotion and giving history

.  demographic data of the respondents

1.   Using SAS Visual Analytics

a.   Sign in to SAS Visual Analytics.

b.   Select Explore and Visualize Data to begin accessing and exploring the data.

c.   Select the PVA_PARTITION data source.

d.   Select the Data pane on the left of the canvas (if it is not open).

1)   Which level of the Status Category 96NK variable has the highest count?

2)   Does the variable Age contain any missing values? If so how many?

3)   What is the average of Target Gift Amount?

e.   Change Target Gift Flag from a measure to a category. It is a binary indicator that represents a response to a mailing, where 1 indicates that customers did respond.

1)   How are responders and non-responders distributed in the


2)   How many females responded to the campaign?                                 

f.   Save the report. Click  (Menu) and select Save As. Save the report in My Folder ð Analytics Toolbox with the name Exercise 1. Click Save.

Continue to work with the PVA_PARTITION data set to train a neural network model. The model aims to classify those customers who made a donation.

2.   Training a Neural Network Model in SAS Visual Data Mining and Machine Learning

a.   Open your saved report, Exercise 1, which was created above.

b.   Select the Data pane on the left of the canvas and open the PVA_PARTITION data source.

If you have not done so already, in the Measure column, right-click Target Gift Flag and select Category.

c.   Create a new page.

d.   Add a neural network to the canvas.

e.   Disable auto-refresh on the menu bar (if not done already).

f.   Add Target Gift Flag as the response.

g.   Under Predictors, click Add. In the Add Data Items window, select all predictor variables except for these five:

.  Control Number

.  Demographic Cluster

.  Partition

.  Target Gift Amount

.  Target Gift Amount with Zero (In all, you add 24 predictors.)

h.   Create the neural network model by clicking Refresh or enabling auto-refresh.

.  How many observations are used by algorithm?

.  Why all observations are not used by algorithm?

.  What is the misclassification rate for the model created with default settings?

i.   Select the Options pane on the right and change Optimization Method to SGD. Do you

see any improvement in the misclassification rate?

j.   Perform honest assessment and examine the results.

1)   Select the Data pane on the left of the canvas and set the Partition variable as a new partition.

2)   Select the Roles pane on the right of the canvas and assign the Partition variable under the Partition ID role. Refresh the model and note the validation

misclassification rate.

3)   Select the Options pane and change the L2 regularization parameter value to 0.001. Under Hidden Layers, change Number of Hidden Layers property to 2. Do these

changes result in any improvement in the validation misclassification rate statistics?

4)   Examine the validation cumulative lift chart. What can you determine about the top

10% (percentile) of the data? How does this model compare to the Best model?

Exercise 2: Text Analytics (25%)

In this exercise, you will need to load your own dataset Z_MOVIES_DATA onto SAS Viya for     Learners.  You can download the dataset from the MBAS905 Moodle Site under Assessment 3. The dataset is a file named “Z_MOVIE_DATA.csv”, which is a comma delimited file in UTF-8     format.  Please follow the instructions given in “KB article VFL Individual Data Upload.pdf” to

upload the dataset onto SAS Viya for Learners.

Once you have successfully uploaded the dataset, you can then proceed to work out the following exercise:

Z_MOVIES_DATA dataset contains seven variables.  The text variable is description. The   variable Made_Money can be used as a category variable. Set the variable title as a display variable.

Using Build Models (Model Studio), create a Text Analytics project as indicated in the following New Project window:


In this Exercise, you will need to identify and name at least four movies that are categorized as Thriller and another four movies that are categorized as Comedy.  You need to show the node windows that display these results.  Provide no less than six (6) screenshots for your

Exercise 2 submission.  You should also provide brief explanation on what you have observed on each screenshot. You may find the file CommonConcept.txt to be useful in this exercise.

Part 2: Report/Essay (50%)

You are to write a short report/essay (maximum 3000 words):

“Write a report/essay on what you would do as a Business Analyst and Manager of the Analytics Department in developing a set of guidelines and policies for data privacy, data security, data

breach prevention and ethics for your organization. Pay particular attention in Advanced AI,    Internet of Things (IoT), Edge Analytics, and robotics technologies used in business analytics and intelligent information analysis, in one of the following business/industries.”

You can choose one (or more) of the following industries as the background in supporting your report/essay:

.    Online eCommerce

.     Retail Business (Bricks and Mortar)

.     Human Resources Management

.    Banking

.     Financial Management

.     Insurance

.    Risks Management

.    Automation

.    Automotive

.    Manufacturing

.    Transports Logistics

.    Supply Chain Management

.    Hospitality

.     Utility (e.g. electricity, water, gas)

.     Energy (e.g. hydro, coal, solar, wind, biomass, gas, nuclear)

.    Real Estate

.    Building and Construction

.     Infrastructure Management.

.    Healthcare

.    Education

You can also propose an industry that is not listed above.