COMP810: Data Warehouse and Big Data 2023 Semester 2


COMP810: Data Warehouse and Big Data

Big Data Research Report Guideline

(Weight: 50%)

2023 Semester 2

Assignment Ver 4.1

1. Assessment Overview

This assessment delves into the realm of big data through comprehensive research and practical applications utilising the techniques imparted in this course. Please follow these steps:

1.   Students can work individually or in groups (two members max). Find a teammate if you

want to work in a group. You can join the same group with your teammate via Canvas -

People. The marking rubric from Section 9 applies to Individual and Group projects equally.

2.   Choose a Big Data-related topic from the use cases provided in Section 2 OR be creative and come up with your own novel topic.

3.   Delve into academic literature: You are required to read and critically analyse a minimum of eight peer-reviewed articles on your chosen topic.

4.   Through your research, identify at least two key themes. Explore further into these themes   and then draw comparisons from the various papers you have studied. Themes may include, but are not limited to the following:

•   Different approaches or algorithms to solve a specific problem.

•    Scientific findings derived from experimentation.

•   Varied perspectives on a given issue.

•   Pros and cons of a particular method, technique, or theory.

•   The impacts or effects of a challenge.

•   Challenges and opportunities within the field.

5.   Customize this work -  Introduce your unique ideas or proposed approaches, suggest ways to extend existing work, and share your perspectives on big data-related issues.

6.   Put theory into practice - Select and download an online dataset to conduct preliminary

analysis. Endeavour to apply the techniques taught in this course. If writing a program, you may choose any language you are comfortable with.

7.   Think like a computer scientist - Write pseudo-code and illustrate how you would apply MapReduce to solve the problem stated in your report.

Presentation and Length of the Report

•   Presentation matters: Compile your findings, analyses, and insights into a well-structured report. Use either LaTeX or Word to create a minimum 6-page document (including

references) following the two-columns IEEE proceedings format.

•   Remember, the main goal of this assignment is to develop a deeper understanding of big data and its techniques. Emphasis is placed on both theoretical knowledge and practical skills,

along with the development of critical thinking.

•   For recommended citation style, see Rubric in Section 9.

2. Big Data Report Topics

You can pick one of the following topics OR come up with your own. The details of some of the following topics will be explained in the slides of "Introduction to Big Data".

Post-disaster Analytics: This topic focuses on the utilisation of big data techniques to

analyse post-disaster data, enabling a more efficient response and aiding in recovery efforts. It may involve analysing damage, predicting outcomes, and optimising resource allocation.

News Analytics: It involves the collection, filtering, and analysis of news articles or media. The insights derived can be used in various sectors, such as finance (to predict stock movement), politics (to gauge public sentiment), or even disaster management (to monitor situations).

ChatGPT and Big Data: Investigate how ChatGPT can be used to analyse and interpret big data. Can it provide unique insights or aid in the data analysis process?

AI for Predictive Analytics: Look into how artificial intelligence is shaping predictive analytics. How do machine learning models improve the accuracy and efficiency of predictions?

Privacy Issues in Big Data: Big data inherently involves collecting and analysing large  amounts of information, raising significant privacy concerns. How can these concerns be mitigated while still allowing for effective data analysis?

Big Data in Healthcare: How is big data being used in healthcare, and what potential does it have for the future? This could include predictive modelling, patient risk identification, and disease outbreak tracking.

Big Data and Climate Change: Can big data help in the fight against climate change? Look at how data analysis can aid in tracking environmental changes, predicting future trends, and informing climate policy.

Product Recommendation: This topic involves creating systems that recommend products to users based on their previous activity, preferences, and behaviours. These recommendations are often powered by machine learning algorithms and are prevalent in e- commerce and media streaming platforms.

Social Influence Diffusion Analysis: This topic focuses on understanding how information, behaviours, or trends spread across social networks. It is especially relevant in marketing, public health, and political campaigns.

Customer Churn Analysis: This is about predicting which customers are likely to stop using a product or service. By identifying these customers in advance, companies can take action to  retain them, often through targeted offers or improved service.

Customer Segmentation: It involves dividing a company's customers into groups that reflect similarities among customers in each group. The goal is to identify the right marketing

strategy for each segment to maximise value for both the customer and the company.

Sentiment Analysis: This deals with the use of natural language processing to identify and extract subjective information from source materials. It's often used in social media

monitoring, allowing companies to track sentiment towards their brand and products.

Anomaly Detection in Online Social Networks: This topic revolves around detecting

patterns in social network data that do not conform to expected behaviour. These "anomalies" could be signs of fraudulent activity, cyberbullying, or other significant events.

Market Basket Analysis: A modelling technique based upon the theory that buying a certain group of items makes you more (or less) likely to buy another group of items. It is commonly used in retail to identify items purchased together, aiding in-store layout and product placement.

3. Layout and Guidelines

The Big Data Research Report should include the following sections:

Title: The title should be concise, specific, and indicative of the work you've done in your report. It should ideally not be longer than 15 words.

Abstract: The abstract is a summary of your entire report, ideally 150 - 200 words long. It   provides an overview of your chosen topic, the methods you have used for data analysis, the main findings, and your conclusions.

Keywords: Select five relevant keywords that accurately represent your report. These should be specific to your topic, method, and findings and help others find your work when searching for similar research.

Introduction: This section gives the background information on your topic, stating the   motivation behind your research and why it is important. It should present your research question and briefly explain your approach to answering it.

Related Work: This section provides a comparative analysis of at least 8 existing research works from peer-reviewed sources related to your topic. Highlight the existing gaps in the  context of the big data environment. Blogs,e.g.,Medium, and pre-print articles, e.g., those from arXiv, are not peer-reviewed publications but can be cited in your report.

Your Opinion: This section should showcase your unique perspective on the topic. You

might propose new ideas or models, suggest alternative approaches to the problem, or discuss how the existing work can be extended. Your opinion associated with big data should be included. Introduce these comments appropriately. Any information from previously published works (journal article, book chapter, etc.) MUST be cited.

Data and Fundamental Statistical Analysis: This section must describe the dataset

involved in your report (e.g., fromGoogle Dataset Search) as well as the fundamental statistical analysis you've conducted using big data techniques. The data set source (weblink) must be included.

Your preliminary analysis can be conducted (a) using existing tools or (b) by writing your

own program. It is strongly recommended that you apply the techniques discussed in this

course. You can use any programming language of your preference for your data analysis.

Include relevant figures and captions (as well as citations) in your report to substantiate your findings.

Aggregation and Visualisations: In this part, you should aggregate your data, present

relevant findings, and visualise these findings in a meaningful way. Graphs, charts, and other data visualisations for conveying complex information are strongly recommended.

Remember to tailor the aggregation and visualisations to the specific domain and dataset that align with your topic.

MapReduce Pseudo-code with Explanations: Provide pseudo-code for a MapReduce algorithm that would be suitable for analysing your dataset. The pseudo-code with explanation of how it works and why you chose to use it must be included in this section.

Conclusion and Future Issues: Summarise your key findings and the conclusions. Discuss potential future issues, implications of your research, and suggestions for future studies inyour area. Cite appropriately.

References: Include all the sources you've cited in your report, listed in the correct citation format. This demonstrates the breadth of your research. Recent 5 years'publications are preferred.

4. Regulations of Using GAI Tools - ChatGPT

What Students Can Do with Generative AI Tools Like ChatGPT:

Idea Generation: Students can useChatGPTto brainstorm ideas or get suggestions to

choose their report topic. This tool can be beneficial in providing different perspectives on a   subject. Students are required to searchGoogle Scholarto verify any idea that is given by the GAI.

Research Assistance: ChatGPT can provide summaries and explanations of complex

concepts, which can be beneficial for students in their initial stages of research. It can also suggest further reading or resources to deepen their understanding. However, all the suggested readings need to be verified by the student.

Technical Support: Students can use ChatGPT to troubleshoot and seek solutions to

technical problems they might encounter during their projects. By posing their issues in the form of questions, ChatGPT can offer potential solutions or direct students towards relevant resources.

Proofreading & Editing: Students can use ChatGPT to check their grammar, punctuation,

and spelling. It can provide suggestions for rephrasing sentences or improving the flow of the report.

What Students Should NOT Do with Generative AI Tools Like ChatGPT:

Plagiarize: Students are not allowed to use the output from ChatGPT without giving proper attribution or without adding their own understanding and analysis. Academic integrity is paramount; directly copying from an AI output is considered plagiarism. Specifically,

students are prohibited from utilising ChatGPT to produce paragraphs that are then directly copied into their reports.

Sole Source of Information: ChatGPT should not be the sole source of information for a report. Students should corroborate any facts or information from ChatGPT with other reliable academic sources, e.g.,Google Scholar.

Over-reliance: DO NOT over-rely on ChatGPT to provide ideas. The tool should be used as an aid and not as a substitute for their own critical thinking and writing skills.

Unverified Information: Students should not directly take the information provided by

ChatGPT. It's crucial to verify information using reputable, peer-reviewed sources, as GAI tools can make mistakes or have outdated information.

5. What to submit

•   A single PDF file. Minimum 6 pages using two-column IEEE proceedings format.

•   Your submission should be named with your family name and student ID, e.g., Smith_ 12345. If you work in a group of two students, please include both students' info, e.g.,

Smith_ 12345_Li_54321.

6. When to submit

•   Due date: The end of Week 7 (Friday, 11:59 PM)

7. How to submit

•   All assessments should be submitted through Canvas.

•   NOTE: If you are using any material or figures in the assessment that is not your own, do remember to cite/reference the source.

•   All assessments will be assessed through the Turnitin system. In case of plagiarism, the University policy against plagiarism will apply.

8. Important Notes

•   Late penalty: maximum late submissions time is 24 hours after the due date. In this case, 5% late penalty will be applied.

•   Incorrect Paper Format: 10% penalty if the format of the paper doesn't conform to the required format -2 columns IEEE proceedings format

•   Penalty for Failure to Meet the Required Paper Length: 5 - 5.5 pages -5%, 4-5 pages - 10%, 3-4 pages -20%, 1-3 pages -50%.

•   Late submission with SCA: no penalty if the assignment is submitted within the SCA period.

•   Plagiarism: If the assignment is classified as plagiarism (the overall similarity score is

above 25%), all the items will be given 0. Do check your similarity report immediately    after submission. Revise the report and resubmit it if it doesn't meet the requirement. If a particular section has a very high similarity score, a penalty for that section will be

applied. Copying data analysis from online resources, e.g., Kaggle, will be considered plagiarism.

•   Plagiarism with ChatGPT: If we suspect a significant portion of the report has been

generated by ChatGPT, wereserve the right to require the students to present and discuss the big data report in person through an interview format.

•   IMPORTANT: If you intend to lodge a Special Consideration Application, please hold your assignment submission. If the student does submit an SCA along with the

assignment before the due date, the late submission will NOT be graded.

9. Big Data Research Report Rubric


Report Components and Requirements




(60% - 80%)






Title, Abstract, Keywords

The title summarises the main idea  of the study and gives a reasonable scope of the research.

The abstract should be one

paragraph (200-250 words), and it should cover the whole theme of   the report.

The title reflects the main idea of the study. It

contains the fewest

possible words to

adequately describe the research paper's content and purpose.

The scope of the research is reasonably described

based on the title.

The length of the abstract is adequate. State clearly

the question being asked in 200-250 words.

Highlight the most

important findings with enough information to    understand the research.

States the major findings and conclusions.

The title reflects the

main idea of the study    but gives abroad scope.

The abstract is written in a scientific style but is

relatively well organised and concise.

The abstract includes a concise summary of

questions and findings.

The title is not tailored to the student's research.

The abstract does not give an overview that leads directly to the reader being able to

state the study's major findings.

The abstract's length appears inadequate, i.e., too long or too short.

The abstract is not written in a scientific

style. Even some references are included in the abstract.



The introduction part should

introduce the topic and its

importance. It should also include at least one diagram related to the topic. One page is sufficient for

the introduction section.

The introduction

captivates, clearly stating the main topic and

outlining the paper's structure.

The background is

logically organised,

explaining the specific reasons for the study.

The introduction

adequately states the

main topic and previews the paper's structure.

The background is fairly organised, although the motivation for the study could be stronger.

Reasons for the study

The introduction

presents the main topic but fails to

preview the paper's structure


The background is somewhat

disorganised, and

The introduction

lacks a clear

presentation of the  main topic, and the structure of the

paper is not outlined.

The background is disorganised, and