CSMDM21 Data Analytics and Mining


Department of Computer Science

Summative Coursework Set Front Page

Module Title: Data Analytics and Mining

Module Code: CSMDM21

Type of Assignment: coursework

Individual / Group Assignment: Individual

Weighting of the Assignment: 100%

Page limit/Word count: Maximum 16 pages

.     Excluding the front page of information and references.

.     Including figures, diagrams, graphs, and tables.

.    Times New Roman, 12pt., 1.15 line spacing.

.    Table of content, abstract/introduction are not required.

.    The report should be clearly structured with a separate section (with appropriate subsection) for each task and a final conclusion.

Expected hours spent for this assignment: 40 hours

Items to be submitted on-line through Blackboard Learn:

1. report.pdf,

2. workflow_group.knar - exported KNIME workflow group containing data files and workflow(s).

Work to be submitted on-line via Blackboard Learn by: 2023 November 24 (Friday) 12:00 noon

Work will be marked and returned by: 2023 December 15 (Friday) 12:00 noon


Major Coursework (100% of module assessment)

This assignment should be carried out using KNIME - the data analytics and mining platform presented in the module.

In this assignment, you are required to build and test classification models for a customer dataset to predict the target customer segmentation.

.     Data: customer.csv (available on Blackboard)

.     Description: customer_description.txt (available on Blackboard)

Task 1: Data Understanding and Preprocessing (30%)

Construct a KNIME workflow to understand the data characteristics and quality, report and discuss your findings. Based on the data understanding, identify and discuss the required data preprocessing steps, and perform them in the KNIME workflow.

Task 2: Classification (30%)

Next, add to or construct another KNIME workflow to build at least two classification models using the dataset by experimenting with at least two different algorithms and/or their hyperparameters. You can use any classification algorithms. In the report, describe the adopted algorithms, also discuss and justify your selection of algorithms and parameters. You may use experiments to support your discussions and justifications.

Task 3: Model Evaluation (30%)

Finally, add to the KNIME workflow to evaluate the trained models using appropriate performance measures and evaluation methods. In the report, describe the adopted performance measures and evaluation methods, discuss and justify your selection of performance measures and evaluation methods, present and analyse the results, also discuss the result reliability.

Report Quality (10%)

The  report  should  be  clearly  structured  with  a  separate  section/subsection  for  each  task/subtask  with  a  final conclusion  and  references.   For  each  task,  solutions  should  be  described  with justifications/discussions.   KNIME workflow  images  should  be  presented  in  the  report,  including  relevant  node  configurations.  Results  should  be presented in the report with analysis and discussions. Sections, figures and tables must be numbered. References should    follow    a    suitable     academic    format    (https://www.reading.ac.uk/library/finding-info/guides/lib-citing- references.aspx).

Figure 1: How to export a KNIME workflow group

Front page of the submission (report)

Module Code:

Assignment report Title:

Date of submission:

Actual hrs spent for the assignment:

We will use information about how long you spent on the assignment when we review and balance

coursework between modules for later years. An exact answer is not necessary, but please try to give a reasonable approximation.

Assessment Classifications:

The table below shows what is typically expected of the work to obtain a given mark, each part of the assignment is marked according to the following criteria.

Classification Range

Typically the work should meet these requirements

Distinction (≥=70%)

Outstanding/excellent work with correct results, a good presentation of the workflows,  code  and   results,   and  a  critical  analysis  of  the   results.  An outstanding work will present fully automated solutions based on advanced techniques.

-     All parts of the assignment are completed correctly,

-     comprehensive discussions,

-     helpful & precise comments,

-     deep & insightful analysis,

-     excellent & compelling presentation of the work.

Merit (60-69%)

Good work with mostly correct results and good discussions: most work has been carried out correctly. The presentation is good, well structured, clear and complete with respect to the work done.

Pass (50-59%)

Achievement  of  the   minimum  requirements  with   little  discussions:  some significant  part  of  the  assignment   is  missing  and/or   has  partially  correct results. The presentation is, in general, accurate and complete, though it may lack some clarity and quality.

Fail (<50%)

Incomplete solutions to limited part of the assignment with very little or no discussions.  Most tasks have  not  been carried out with sufficient accuracy. Results  may  not  be  correct  or  technically  sound.  The  presentation  is  not accurate/complete and lacks clarity.

Marking Scheme:

Task  1:  Data Understanding and Preprocessing (30%)

.     [10 marks] Workflow performing data understanding and preprocessing.

.     [20 marks] Reporting each of the data understanding and preprocessing step, findings and the corresponding discussions.

Task 2: Classification (30%)

.     [10 marks] Workflow performing the classification task.

.     [5 marks] Descriptions of the adopted algorithms and parameters.

.     [15 marks] Discussions and justifications of your selection of algorithms and parameters.

Task 3: Model Evaluation (30%)

.     [6 marks] Workflow performing model evaluation.

.     [6 marks] Descriptions of performance measures, discussions, and justifications.

.     [6 marks] Descriptions of evaluation methods, discussions, and justifications.

.     [12 marks] Results, analysis and discussions.

Report Quality (10%)

.     [10 marks] Report structure, conclusion, references, quality of figures and tables.