关键词 > CS982/CS989
CS982/CS989: Big Data Technologies - Coursework
发布时间:2023-10-25
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
CS982: Big Data Technologies - Coursework
CS989: Big Data Fundamentals - Coursework
Part I - 70%
AIM OF THE ASSIGNMENT
To provide a deeper understanding of appropriate methodological approaches to processing and analysing noisy data; and to encourage appreciation of the challenges involved in data analysis.
LEARNING OUTCOMES
To understand the fundamentals of Python to enable the use of various big data technologies; to understand how classical statistical techniques are applied in modern data analysis; to understand the potential application of data analysis tools for various problems and appreciate their limitations; To understand the challenges and complexity of data analysis.
TEAM
Students can work on the coursework and submit it individually. However, group work is allowed, and a group can be formed of maximum 3 students. Please use the activity “Coursework Group Selection” on MyPlace to select your group, even if you are going to work alone. More information is available on MyPlace. If you are working in group, all students must work together and equally on all the questions,and the work cannot be divided between the students.
SUBMISSION
The report to be submitted should be 2500 words (+/- 10%) excluding the front cover, table of content, list of figure / tables, and appendices. The document must be in pdf format. All code used for the analysis is also to be submitted, if not submitted the submission will be considered incomplete and a late penalty will be applied until all components of the assessment are submitted; More details will be available on the submission page on MyPlace. Both the code and the report should be submitted using MyPlace; no submission will be accepted in any different way.
EXTENSION
Any extensions should be requested in advance of the submission deadline, with a valid reason. Assessments submitted after the deadline without an approved extension will be subject to penalties on asliding percentage scale: 10% for the first 24hrs, and 5% for each additional day. Penalties will be applied to late submitted assessments up until four days, and assessments submitted after four days of the deadline will receive a mark of zero.
All extensions must be requested through MyPlace. In addition, for any extension longer than 3 days, you must add a self-certificate1 on Pegasus to support the extension, otherwise the request for an extension will be rejected.
DEADLINE
Submission date: 12:00 noon, Monday October 30th, 2023
THE BRIEF
Provide a brief report on the analysis of an open dataset. There are some restrictions on the dataset that can be selected (see below “ DATASET RULES”). You can focus your report on one aspect of the dataset or multiple aspects, the main objective is to find some interesting questions or problems to answer.
The following criteria will be used when marking your assignment:
• Identification and description of key challenge(s) or problem(s) to be addressed 10%
• Introduction to the dataset 10%
• The challenge(s)/problem(s) is (are) to be addressed using the following
o Summary statistics (including figures) for data being analysed 20%
o Description, rationale, application and findings from only one unsupervised analysis method covered in the module 20%
o Description, rationale, application and findings from only one supervised analysis method covered in the module 20%
• Reflection on methods used for analysis 10%
• Structure presentation, and proper citation of references 10%
DATASET RULES
Example datasets are available on:
The UCI Machine Learning Repository:https://archive.ics.uci.edu/ml/datasets.php
Kaggle website:https://www.kaggle.com/datasets
You can also select a dataset from other sources, but make sure that the dataset is public and that you have the right to access and analyse the dataset and to share the results.
However, you cannot select a dataset that satisfies one of the rules below. Submitted projects on one of these datasets will receive a mark of zero.
Dataset packaged with Scikit-Learn
Boston house-prices dataset
Iris dataset
Diabetes dataset
Digits dataset
Linneruddataset
Wine dataset
Breast cancer wisconsin dataset
For more information:https://scikit-learn.org/stable/datasets/toy_dataset.html
Datasets packaged with Seaborn
anscombe.csv: Anscombe dataset
attention.csv: Attention dataset
brain_networks.csv: Brain networks dataset
car_crashes.csv: Add 538 car crash dataset
diamonds.csv: Add diamonds dataset
dots.csv: Add dots dataset
exercise.csv: Add exercise dataset
flights.csv: Add flights dataset
fmri.csv: Change sorting of events in fmri data
gammas.csv: Make fake fmri data make a bit more sense
iris.csv: Add iris dataset
mpg.csv: Add mpg dataset
planets.csv: Planets dataset
tips.csv: Tips dataset
titanic.csv: Titanic dataset
For more information:https://github.com/mwaskom/seaborn-data
Datasets that we have seen during lecture/lab sessions.
Part II
AIM OF THE ASSIGNMENT
Self-assessment activities aim to involve the learners in evaluating the outcome of their work. Students are encouraged to be a realistic judge of their own performance.
LEARNING OUTCOMES
Objectively reflect on and critically evaluate their own progress and skill development; discern how to improve their performance; develop critical reviewing skills.
TEAM
This is an individual work and students are expected to self-assess their work individually. Groupwork for this part of the coursework is not allowed.
SUBMISSION
The self-assessment must be submitted through MyPlace using the self-assessment activity.
DEADLINE
Submission date: 12:00 noon, Monday November 6th, 2023 or one week after the approved date of the extension for the first part.
THE BRIEF
The self-assessment activity allows you to self-assess your report using the "Example Marking scheme" file and provide an expected mark for the coursework with a justification.
• Expected mark: you are expected to give a final expected mark for the first part on a scale of 1- 100. Then this mark will be used in a specific formula to calculate the first half of your mark for Part II, as explained below. We denote by SM your mark from the self-assessment and by MM the mark you will receive from the marker:
o If SM is greater than or equal to MM, then the following formula will be applied to calculate the mark for this part: (1 - (SM - MM) / (100 - MM)) * 100
o If SM is less than MM, then the following formula will be applied to calculate the mark for this part: (1 - (MM - SM) / MM) * 10
The below table shows a small example on different possibilities, where 0 is the lowest and 50 is the highest mark you can get on this part: 50%
MM |
SM |
Mark |
60 |
100 |
0 |
60 |
92 |
10 |
60 |
68 |
40 |
60 |
60 |
50 |
60 |
54 |
45 |
60 |
48 |
40 |
60 |
12 |
10 |
60 |
0 |
0 |
• Justification: Your justification for the expected mark should be based on the "Example Marking scheme" provided on MyPlace. 50%