RESIT ASSIGNMENT - CS989: Big Data Fundamentals
AIM OF THE ASSIGNMENT
To provide deeper understanding of appropriate methodological approaches to processing and analysing noisy data; and to encourage appreciation of the challenges involved in data analysis.
LEARNING OUTCOMES
Understanding of the fundamentals of Python to enable the use of various big data technologies; Understand how classical statistical techniques are applied in modern data analysis; Understanding of the potential application of data analysis tools for various problems and appreciate their limitations; Understanding of the challenges and complexity of data analysis.THE BRIEF
Provide a brief report on analysis of an open data set. Example data sets are available the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets.html) or Kaggle (https://www.kaggle.com/datasets) for example. There are some restrictions on the dataset that can be selected (see below). You can focus your report on one aspect of the dataset or multiple aspects, the main objective is to find some interesting questions or problems to answer.The following criteria will be used when marking your assignment:
The following criteria will be used when marking your assignment:
RESTRICTIONS ON DATASETS
You must use a different dataset to the original submission for this module. Any dataset that comes bundled with scikit-learn or Seaborn e.g. Iris Dataset, is also not allowed. To ensure there are no misunderstandings regarding dataset used explicit consent to use a dataset must be given by the class lecturer before proceeding.
SUBMISSION
The report to be submitted should be 3000 words (+/- 10%) including references. This document must be in pdf format. All code used to the analysis is to also be submitted, if not submitted the submission will be considered incomplete. Both the code and report should be submitted as a zip file. The standard university penalty for late submissions is applied. Any extensions should be requested in advance of the submission deadline. There are two submission deadlines for this assignment as outlined below.
2019-05-26