Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit


COMP4131 Data Modelling and Analysis

Coursework Name : Coursework 2
Weight:75%
Deliverable:(a brief description of what is to be handed-in; e.g. ‘software’, ‘report’, ‘presentation’, etc.)

This assignment requires you to work individually.

You will need to analyse a data set using all the data modelling and analysis steps you have learnt to create and compare your trained models.

You will write your work up as an academic paper, comparing and analysing your results of the data modelling and analysis pathway (6 to 8 pages including references and diagrams) as stated in this coursework specification.

Coursework Deliverable Requirements:

1. Paper Submission
• The paper must be formatted using the IEEE conference template (available here).
• Include all sections as required by the assignment brief (e.g., abstract, methodology, results, discussion, etc).
2. Code Submission
• Submit a single, well-documented Jupyter Notebook (.ipynb) containing all code used for your analysis/modeling.
• Ensure the notebook executes without errors when the dataset is placed in the same folder as the notebook.
• Important: Use relative paths (e.g., pd.read_csv("dataset.csv")) to load the dataset. We will not debug path-related issues.
3. Dataset Submission
• Include the raw dataset file(s) (e.g., .csv, .json, .xlsx) in your submission.
• The dataset must be the exact version used in your code.
4. Reproducibility Requirement
• Your submission will be tested by placing the notebook and dataset in the same folder and running all cells from scratch.
• Penalties apply if the code fails due to:
▪ Missing/inaccessible dataset files.
▪ Hardcoded absolute paths (e.g., C:/Users/name/dataset.csv).
▪ Unclear instructions or dependencies not listed in the notebook.
5. Additional Notes
• List all Python library dependencies in a README.txt or at the top of the notebook (e.g., pip install pandas numpy matplotlib).
• Use markdown cells to explain key steps and assumptions.
Format
(summary of the technical format of deliverable, e.g. “C source code as zip file”, “pdf file, 2000 word max”, “ppt file, 10 slides max”, etc.)

Submit your academic paper following this naming convention:

Name_ID.docx or Name_ID.pdf.

For example: XiaomingChen_20712345.docxSubmit your Jupyter Notebook file following this naming convention:

Name_ID.ipynb.

For example: XiaomingChen_20712345.ipynb

Zip the following files:

• XiaomingChen_20712345.docx
• XiaomingChen_20712345.ipynb
• dataset.csv (or other relevant format)
• (Optional) README.txt for setup instructions.

Submit your zip file following this naming convention: Name_ID.zip.

For example: XiaomingChen_20712345.zip

Issue Date:l16 March 2026

Submission Date:6 May 2026, by 5.00PM.

Submission Mechanism:Via Moodle.

Late Policy (University of Nottingham default will apply, if blank): The standard late submission policy applies, i.e. 5% deduction of the total mark for every 24 hours (including weekends and holidays).

Feedback Date:12 June 2026

Feedback Mechanism:Feedback will be provided via Moodle.

Instructions

For this coursework assignment, you will work individually to analyse a dataset of your choice. You may use any publicly available dataset from online sources such as the UCI Machine Learning Repository, Kaggle, or other reputable platforms. Your task is to apply the data modelling and analysis techniques you have learned in the course to preprocess the data, train models, and compare the performance of your trained models with state-of-the-art methods. Your work should demonstrate a clear understanding of the data modelling pipeline, from data preparation to evaluation, and include a critical analysis of your results.

You will write your work up as an academic paper, comparing and analysing your results from the different stages of the data preprocessing, analysis, and modelling pathway. You will also need to compare with different models or state-of-the-art methods.

You will need to present your paper in an IEEE format using a template from here: https://www.ieee.org/conferences/publishing/templates.html

Your paper should be between 6 to 8 pages (including tables, diagrams, and references as appropriate) and submitted as a docx/PDF. The table and figures should add value to the writing.

Paper Structure and Mark Allocation

Your paper should be organised according to the following structure. The mark allocation for each section is shown in brackets.

1. Title and Abstract (3%)
Provide a clear title and a concise abstract summarising the research problem, dataset used, modelling approaches, and key findings.
2. Introduction (5%)

Introduce the dataset and the research problem being investigated. Clearly state the research question(s) and the objectives of the analysis.

3. Literature Review (5%)

Review relevant work from existing research that applies similar data analysis or modelling techniques, particularly studies that use the same or similar datasets. Highlight key methods and findings from the literature.

4. Methodology (20%)

Describe the proposed methodology for your data analysis and modelling pipeline. This should include:

• Data preprocessing and preparation steps
• Feature engineering or transformation methods
• Model selection and justification
• Any enhancements or modifications to standard approaches
• Provide clear justification for your methodological choices.

5. Experimental Results and Analysis (20%)

Present the experimental process and results. This should include:

• Exploratory data analysis and visualisations
• Experimental settings and model configurations
• Performance evaluation of the trained models
• You are expected to implement and compare multiple modelling approaches.

6. Discussion (15%)

Provide a detailed analysis of your results. Interpret the findings and discuss possible reasons for the observed performance. Where possible, compare your results with findings from existing studies referenced in the literature review.
7. Conclusion and Future Work (10%)

Summarise the main findings of your work and reflect on the effectiveness of the proposed approaches. Suggest possible improvements or directions for future research.

8. References (2%)

All sources must be properly cited using an appropriate academic referencing style (i.e., IEEE format).

Code Submission

In addition to the written report, you must submit a single Jupyter Notebook (.ipynb) containing all code used in the analysis.

The notebook should:

• Be clearly organised with well-commented code and explanatory markdown cells

• Demonstrate the complete data modelling and analysis workflow, including:

• Data preparation and preprocessing

• Exploratory data analysis (EDA)

• Feature extraction or transformation

• Model training and evaluation

• Allow the results presented in the paper to be fully reproducible

Your submission will be evaluated based on whether the notebook can be successfully executed to reproduce the reported results.

The aim of this coursework is to provide practical experience in working with a real-world dataset and applying the full data modelling and analysis pipeline, from initial data preparation through to model development and evaluation.

Assessment Criteria

The total marks for CW2 will be out of 100 and scaled to represent the 75% weighting.

The main assessment criteria for the paper are:

Section
Weightage %
Criteria
Title and Abstract
3
The title and abstract clearly reflect the content of the paper. The abstract concisely summarises the problem, dataset, methodology, and key findings.
Introduction
5
The dataset and research problem are clearly introduced. The dataset is appropriately described, and the research question(s) are clearly stated and relevant to the context of the dataset.
Literature Review
5
Relevant and recent research papers are identified and discussed. The approaches and key findings of existing studies using similar datasets or methods are clearly summarised.
Methodology
20
Appropriate methods are selected for data preprocessing, analysis, and modelling. The methodological choices are clearly explained and justified. Any enhancements, modifications, or innovative aspects of the proposed approach are clearly described.
Experimental Results and Analysis
20
The techniques are implemented correctly and the experimental process is clearly presented. Multiple approaches are implemented and compared where appropriate. Results are presented clearly using suitable tables, charts, or visualisations.
Discussion
15
Results are interpreted critically and discussed in depth. The performance of different approaches is compared and analysed, andfindings are linked to the research question and literature where appropriate.
Conclusion and Future Work
10
The work is clearly summarised and the main findings are highlighted. Limitations of the study are acknowledged, and reasonable suggestions for future improvements or research directions are provided.
References
2
Relevant academic references are included and cited correctly using an appropriate referencing style (e.g., IEEE).
Python Code Implementation
20
The submitted Jupyter Notebook is well structured, clearly commented, and easy to follow. Variable and function names are meaningful and consistent. The code demonstrates the full data modelling workflow, including data preprocessing, exploratory data analysis, model training, and evaluation.Evidence of appropriate modelling practices (e.g., preprocessing, parameter tuning, or model comparison) is expected. The code should reproduce the results reported in the paper.

Academic Integrity and Responsible Use of AI

Students must ensure that all submitted work is their own original work. Academic integrity is a fundamental principle of the University of Nottingham, and all coursework must comply with the University of Nottingham Ningbo China (UNNC) Academic Integrity Policy and the School of Computer Science AI Use Policy.

Plagiarism, collusion, or copying work from other students or external sources is considered aserious academic offence and may result in marks deduction, a mark of zero, or further disciplinary action in accordance with University regulations.

If you use external sources such as academic papers, datasets, documentation, or online resources, you must clearly acknowledge and cite them appropriately.

Students must also ensure that their coursework does not overlap with work submitted for other modules or previous projects. In particular, you must not reuse the same problem formulation, dataset, or a substantial portion of work that has already been submitted for assessment in another module or project. Double submission (submitting the same or substantially similar work for multiple assessments) is strictly prohibited and will be treated as a breach of academic integrity.The School of Computer Science recognises that Artificial Intelligence tools (e.g., generative AI systems) may be used in some contexts. However, students must ensure that any use of such tools complies with the UNNC and School of Computer Science AI Use Policy. Any permitted use of AI tools must be transparent and properly acknowledged, and students remain fully responsible for the accuracy, originality, and integrity of the submitted work.

Your coursework should demonstrate independent thinking, appropriate methodological design, and your own implementation of the data analysis and modelling pipeline.

If you are unsure whether your work complies with the academic integrity or AI use policies, you should consult the module convenor before submitting your work.