COMM3501 Quantitative Business Analytics
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
COMM3501 Quantitative Business Analytics
A4 Individual Assignment (40%)
Due date: Monday 7th August 2023, 12:00 PM (noon) week 11
1. Assignment overview
In this assessment, you will analyse a dataset with an emphasis on practical business analytics and develop authentic outputs. The task aims to enhance your problem-solving skills in real-world
scenarios. It is also intended to develop your skills in research, critical thinking and problem
solving, your data analysis and programming skills, and your ability to communicate your ideas and solutions concisely and coherently.
2. Assignment scenario
You are an analyst at a data analytics consulting firm. Your manager has tasked you with providing a report and an interactive webapp to an American client. The client is a major U.S. wireless
telecommunications company which provides cellular telephone service. They require assistance in developing a statistical model to predict customer churn, establish a target customer profile for implementing a proactive churn-management program, and rolling the solution out to their customer-facing call centres.
These days, the telecommunications industry faces fierce competition in satisfying its customers.
Churn is a marketing term, referring to a current customer deciding to take their business
elsewhere — in the current context, switching from one mobile service provider to another. As with many other sectors, churn is an important issue for the wireless telecommunications industry. For this client, the role of the desired churn model is not only to accurately predict customer churn, but also to understand customer behaviours.
3. Assignment details
3.1. Task details
Your main tasks will involve: data manipulation and cleaning; statistical modelling; writing a technical report; developing and hosting a webapp. Your client also wants a non-technical description of the characteristics of customers that churned, to assist in the development of a risk- management strategy, i.e., a proactive churn-management program.
In your report, your manager wants you to include: some details on your data manipulation, cleaning, and descriptive analysis; a brief summary and comparison of the models you fitted; a detailed description of your selected model/s and interpretation of the results; your main findings, recommendations and conclusions; a short description of your webapp and how to access it.
The client is familiar with machine learning. All your modelling results should be included, mostly in an appendix to the report.
In addition, among the 10,000 customers in the eval_data.csv evaluation dataset, you must identify 3000 customers which you believe are most likely to churn.
See the submission details section and marking criteria section for more information.
3.2. Data Description
The data provides details of 30,000 customers in the training dataset, and 10,000 customers in the evaluation dataset:
1. training_data.csv
2. eval_data.csv
The datasets can be downloaded from the Moodle website in the Assessments section.
For each of the observations in the training dataset, there is information on 44 attributes
describing the customer care service details, customer demography and personal details, etc. These are described below.
Similar, but not identical, datasets are providedhere. You may also wish to have a look at the
following analysis based on the Kaggle datasets to give you an idea: Churn Prediction (weblink).
This analysis is just a brief example and is not based on your datasets. Different and more variables may be of interest for your analysis. Extra readings are given in the Resources section.
3.2.1. training_data.csv (Training dataset)
This dataset provides insights about the customers and whether they are churned customers.
Variable Name |
Description |
CustomerID |
A unique ID assigned to each customer/subscriber |
Churn |
Is churned? (categorical: “no”,“yes”) |
MonthlyRevenue |
Mean monthly revenue for the company |
MonthlyMinutes |
Mean monthly minutes of use |
TotalRecurringCharge |
Mean total recurring charges (recurring billing) |
OverageMinutes |
Mean overage minutes of use |
RoamingCalls |
Mean number of roaming calls |
DroppedCalls |
Mean number of dropped voice calls |
BlockedCalls |
Mean number of blocked voice calls |
UnansweredCalls |
Mean number of unanswered voice calls |
CustomerCareCalls |
Mean number of customer care calls |
ThreewayCalls |
Mean number of three-way calls |
OutboundCalls |
Mean number of outbound voice calls |
InboundCalls |
Mean number of inbound voice calls |
DroppedBlockedCalls |
Mean number of dropped or blocked calls |
CallForwardingCalls |
Mean number of call forwarding calls |
CallWaitingCalls |
Mean number of call waiting calls |
MonthsInService |
Months in Service |
ActiveSubs |
Number of Active Subscriptions |
ServiceArea |
Communications Service Area |
Handsets |
Number of Handsets Issued |
CurrentEquipmentDays |
Number of days of the current equipment |
AgeHH1 |
Age of first Household member |
AgeHH2 |
Age of second Household member |
ChildrenInHH |
Presence of children in Household (yes or no) |
HandsetRefurbished |
Handset is refurbished (yes or no) |
HandsetWebCapable |
Handset is web capable (yes or no) |
TruckOwner |
Subscriber owns a truck (yes or no) |
RVOwner |
Subscriber owns a recreational vehicle (yes or no) |
BuysViaMailOrder |
Subscriber Buys via mail order (yes or no) |
RespondsToMailOffers |
Subscriber responds to mail offers (yes or no) |
OptOutMailings |
Subscriber opted out mailings option (yes or no) |
OwnsComputer |
Subscriber owns a computer (yes or no) |
HasCreditCard |
Subscriber has a credit card (yes or no) |
RetentionCalls |
Number of calls previously made to retention team |
RetentionOffersAccepted |
Number of previous retention offers accepted |
ReferralsMadeBySubscriber |
Number of referrals made by subscriber |
IncomeGroup |
Income group |
OwnsMotorcycle |
Subscriber owns a motorcycle (yes or no) |
MadeCallToRetentionTeam |
Customer has made call to retention team (yes or no) |
CreditRating |
Credit rating category |
PrizmCode |
Living area |
Occupation |
Occupation category |
MaritalStatus |
Married (yes or noor unknown) |
3.2.2. eval_data.csv (Evaluation dataset)
The evaluation dataset comprises 10,000 current customers. From these 10,000 customers, select 3000 which you believe are most likely to churn. This evaluation dataset has the same format as the training dataset but doesn’t include the column Churn. The true values for the column Churn will be released after the due date of the assignment.
3.3. Software
You may choose which software package or program to use, e.g., R or python. The code enabling you to perform most of the computing can be found in the course learning activities.
3.4. Resources
- Extra information on the original dataset and on the context can be found here:link 1and link 2
- Data manipulation with R with the ‘dplyr’ package (weblink)
- Tidy data in R (weblink)
- Exploratory Data Analysis with R (weblink)
- Data visualisation in R with ggplot2 for fancy plots (weblink)
- He and Garcia (2009), for strategies for dealing with imbalanced data in classification problems
- Yadav and Roychoudhury (2018), for some strategies to deal with missing attribute values in R (available on Moodle)
- If you are interested in using R Markdown, here is a guide for creating PDF documents (weblink)
- For any code-related questions, google.com or stackoverflow.com are pretty helpful!
3.5. Marking criteria
You will be assessed against the following criteria:
1. Data manipulation, cleaning, and descriptive analysis
2. Modelling
3. Recommendations and discussion
4. Report writing
5. Webapp development
6. Predictive accuracy
The mark allocation and details for each marking criteria are given below and in the rubric. The materials you submit should be your own. Familiarise yourself with the UNSW policies for
plagiarism before submitting.
3.5.1. Criteria 1-3
There are potentially multiple valid approaches to this task, so you must choose an approach that is both justifiable and justified.
You may also wish to engage in extra research beyond the course content. Please feel free to do so. Although the marks for each component of the assignment are capped, innovations are encouraged.
Any assumptions must be clearly identified and justified, if used. Sufficient details, e.g.,
calculations and results, must be provided. Include an appendix to the report for non-essential but useful results; however, the appendix will not be directly assessed. Ensure that the body of your report is self-contained and addresses all marking criteria.
3.5.2. Criteria 4
Communication of quantitative results in a concise and easy-to-understand manner is askill that is vital in practice. As such, marks will be given for report writing. To maximize your marks for this
component, you may wish to consider issues such as: table size/readability, figure
axes/formatting, text readability, grammar/spelling, page layout, and referencing of external sources.
Include a brief introduction section in your report.
A maximum page limit of 8 pages is applicable to the main body of the report. This limit includes
tables and graphs, but excludes the cover page, table of contents, references, and any appendices. There is no limit to the length of the appendix. Exceeding the page limit will attract a proportional penalty to the overall assignment mark. Your report must be a self-contained document (i.e., not multiple files), with all pages in portrait format.
Consider how the overall look, feel and readability of your document is affected by choices like
margin size, line and paragraph spacing, typeface/font, and text size. If in doubt, don’t stray too far
from the defaults in your word processor / typesetting program, or use something like the following settings: margins of 2.54cm for each edge, 1.15 line spacing, Calibri size 11 text.
3.5.3. Criteria 5
Your webapp must be able to accept user input (data values) for some set of customer
characteristics, and, based on these input values, your app must output whether a customer is
likely to churn or not. Choose and include customer characteristics consistent with your modelling conclusions. You may also wish to provide relevant text and visual (data) output conditional on the predicted churn probability, inline with the client’s aims. Recall that the webapp will be used by
customer-facing call centres in order to implement a proactive churn-management program.
Your webapp must be hosted publicly online and be directly accessible with a hyperlink included in your report. You are recommended to host it for free onhttp://www.shinyapps.io/, but alternatives exist. Supporting material, explaining how to develop and
2023-08-07