ACC7011 Individual Assignment – Business Analytics


Case study using software applications – 40%

This case study counts towards 40% of the module grade, as mentioned in the module outline.

Submission deadline: as shown in Canvas.

Please follow the submission guidelines (provided below at the end) carefully.

DataAnytime is a data analytics firm and has established itself as a leading analytics provider in Belfast since 2016. It has served a wide range of customers over the years, including many manufacturing firms, marketing firms, trading firms, charities, and even political parties.

DataAnalytics is well-recognized in the Belfast region for its specialization in gaining insights from raw data.

Helen started working for DataAnytime just over a month ago and had an important meeting last week with the CEO of her firm, David. During the meeting, the CEO provided an overview of the kind of work DataAnytime has been engaged over the years. He also discussed with Helen some of the jobs received by DataAnytime from its customers recently. The jobs were of various natures but all involved analyzing data and providing key insights from the data. An existing customer, by the name of QuietTime specializes in conducting workshops and organizing conferences in quiet and peaceful areas across the world and has asked DataAnytime if it could provide some insights regarding the noise level in various countries. QuietTime is interested in the level of noise in various countries so that it can identify potential areas where it could organize different workshops and conferences in the year 2023.

Another marketing firm by the name of AlwaysText is examining how companies are providing information in their annual reports. AlwaysText is interested mainly in the non- financial and textual information provided in the annual reports that may not necessarily be reflected in the financial disclosures. AlwaysText has made a name for itself for analysing sentiments contained in textual information in annual reports and also for exposing the practice of ‘green-washing’ whereby companies make bold claims and promises regarding their commitment towards environment, but with very little meaningful action to support such claims.

Yet another company by the name of Movie247 specializes, as the name might suggest, in making award-winning movies. Movie247 has made several movies in 2022 and it plans to nominate some of its movies for the highly acclaimed Noscar award under various categories. Given that Movie247 can self-nominate only 10 movies for the award, it wants to make sure that the movies it submits for consideration have the highest chance of success of winning the Noscar award.

Given the various tasks DataAnytime had to perform soon, David asked Helen if she could provide some key information and insights to help the customers. David also handed her a USB drive with a folder called “Assignment_Data.zip” that contains relevant data for the tasks. “Good luck!”, said David, “but remember, having the data in structured and unstructured form is one thing, but making it ready for analysis is a different matter

altogether”. Helen responded by saying that she would try her best.

Requirement Q 1)

It is understood that the noise level produced by various factories in different countries are denoted by ‘Noise’ in the file StataFile.dta. To help QuietTime choose the right place to organize seminars and conferences across the world, use StataFile.dta and Country.csv to:

i. Create a box plot for noise level by Country 15 Marks

ii. Comment on the box plot 15 Marks

iii. Create an interactive chart by Nation 10 Marks

Provide the codes that can be replicated.

Marking Scheme:

Box plot by Nation – 15 Marks ( an excellent box plot would contain title [2 marks], labels in axes [2 marks], mean values, median values, quartiles [7 marks ], and outliers [2 marks] in a well-presented [2 marks] format)

Commentary – 15 Marks Interactive chart – 10 Marks

Failure to provide working codes will nullify the respective marks above.

Q 2) (You should answer either 2a or 2b; not both)


To help AlwaysText to better understand the way annual reports are being written, it wants to conduct sentiment analyses of the annual reports of various companies. For this purpose, in the first instance, use all the pdf files in the USB drive to perform the following tasks:

i. Count the number of words in each of the pdf files using Python. For this purpose, if you wish, you can assume that any character separated by a single space is a word.15 Marks

ii. Use Python to create a csv file named "AnnualReport.csv" clearly exhibiting the number of words used in each years' annual report, as shown in the following format.5 Marks


Show your codes to create the above csv file. Provide the resulting csv file along with the code.

Marking Scheme:

Counted words – 15 Marks (for the purpose of this task, any character separated by a space can be taken as a word)

CSV file in the above format – 5 Marks

Failure to provide working codes will nullify the respective marks above.


A tedious and time-demanding process (so-called data pre-processing) is necessary to convert the raw real-world data into a well-refined form for analytics algorithms (Kotsiantis, Kanellopoulos, & Pintelas, 2006). In this context, critically discuss the steps involved in making raw data ready for data analytics. 20 Marks

Q 3)

To help Movie247 select its movies for nomination so that its chance of winning a Noscar award is maximized, use movies.csv file to perform the following tasks:

i. build a Decision Tree model to predict the likelihood of a movie winning a Noscar award 15 Marks

ii. build a Random Forest model to predict the likelihood of a movie winning a Noscar award 15 Marks

iii. present the accuracy of the Decision Tree model using confusion matrix. 5 Marks

iv. present the accuracy of the Random Forest model using confusion matrix 5 Marks

Marking Scheme:

Building Decision Tree model – 15 Marks Building Random Forest model – 15 Marks

Confusion matrix and accuracy of Decision Tree – 5 Marks Confusion matrix and accuracy of Random Forest – 5 Marks

Failure to provide working codes will nullify the respective marks above.

Submission guidelines

You should upload the following files in Canvas by the submission deadline:

· “ACC7011_answersheet.docx” (this answer sheet is provided in Canvas; do not change the name of the file and do not edit the locked spaces)

· One interactive map named “Interactive.html” for Q1 iii

· One csv file with the name of AnnualReport.csv” (only if you answer 2a instead of 2b)

All the codes that are required must be pasted properly within the given space in the provided answer sheet ACC7011_answersheet.docx. Please do not submit files with any other names other than specified above.

Plagiarism is treated very seriously; click here for more information on plagiarism.

If your ability to do the assignment is affected due to personal circumstances (e.g. health issues, family issues, etc.), follow the guidelines provided on Exceptional Circumstances.

Any questions that may arise subsequently will be addressed in the FAQ page. You should check this page regularly and prior to submission as well.