AIM OF THE ASSIGNMENT

To provide deeper understanding of appropriate methodological approaches to processing and analysing noisy data; and to encourage appreciation of the challenges involved in data analysis. 

LEARNING OUTCOMES

Understanding of the fundamentals of Python to enable the use of various big data technologies; Understand how classical statistical techniques are applied in modern data analysis; Understanding of the potential application of data analysis tools for various problems and appreciate their limitations; Understanding of the challenges and complexity of data analysis.

THE BRIEF

Provide a brief report on analysis of an open data set. Example data sets are available the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets.html) or Kaggle (https://www.kaggle.com/datasets) for example. There are some restrictions on the dataset that can be selected (see below). You can focus your report on one aspect of the dataset or multiple aspects, the main objective is to find some interesting questions or problems to answer. 

The following criteria will be used when marking your assignment:

The following criteria will be used when marking your assignment:

  • Introduction to the dataset 10%
  • Identification and description of key challenge(s) or problem(s) to be addressed 10%
  • This challenge/problem is to be addressed using the following
    • Summary statistics (including figures) for data being analysed 20%
    • Description, rationale, application and findings from one unsupervised analysis method 20%
    • Description, rationale, application and findings from one other analysis method 20%
  • Reflection on methods used for analysis 10%
  • Structure presentation, and proper citation of references 10%

RESTRICTIONS ON DATASETS

You must use a different dataset to the original submission for this module. Any dataset that comes bundled with scikit-learn or Seaborn e.g. Iris Dataset, is also not allowed. To ensure there are no misunderstandings regarding dataset used explicit consent to use a dataset must be given by the class lecturer before proceeding.

SUBMISSION

The report to be submitted should be 3000 words (+/- 10%) including references. This document must be in pdf format. All code used to the analysis is to also be submitted, if not submitted the submission will be considered incomplete. Both the code and report should be submitted as a zip file. The standard university penalty for late submissions is applied. Any extensions should be requested in advance of the submission deadline. There are two submission deadlines for this assignment as outlined below.