关键词 > CS982/CS989

CS982/CS989: Big Data Technologies - Coursework

发布时间:2023-10-25

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

CS982: Big Data Technologies - Coursework

CS989: Big Data Fundamentals - Coursework

Part I - 70%

AIM OF THE ASSIGNMENT

To  provide  a deeper  understanding of appropriate  methodological approaches to  processing  and analysing noisy data; and to encourage appreciation of the challenges involved in data analysis.

LEARNING OUTCOMES

To understand the fundamentals of Python to enable the use of various big data technologies; to understand how classical statistical techniques are applied in modern data analysis; to understand the potential application of data analysis tools for various problems and appreciate their limitations; To understand the challenges and complexity of data analysis.

TEAM

Students can work on the coursework and submit it individually. However, group work is allowed, and  a group can be formed of maximum 3 students. Please use the activity “Coursework Group Selection” on MyPlace to select your group, even if you are going to work alone. More information is available  on  MyPlace.  If you are working  in group, all students  must work together and equally on all the  questions,and the work cannot be divided between the students.

SUBMISSION

The report to be submitted should be 2500 words (+/- 10%) excluding the front cover, table of content, list of figure / tables, and appendices. The document must be in pdf format. All code used for the analysis is also to be submitted, if not submitted the submission will be considered incomplete and a late penalty will be applied until all components of the assessment are submitted; More details will be available on the submission page on MyPlace. Both the code and the report should be submitted using MyPlace; no submission will be accepted in any different way.

EXTENSION

Any extensions should  be  requested  in  advance of the submission deadline, with a valid  reason. Assessments submitted after the deadline without an approved extension will be subject to penalties on asliding percentage scale: 10% for the first 24hrs, and 5% for each additional day. Penalties will be applied to late submitted assessments up until four days, and assessments submitted after four days of the deadline will receive a mark of zero.

All extensions must be requested through MyPlace. In addition, for any extension longer than 3 days, you must add a self-certificate1  on Pegasus to support the extension, otherwise the request for an extension will be rejected.

DEADLINE

Submission date: 12:00 noon, Monday October 30th, 2023

THE BRIEF

Provide a brief report on the analysis of an open dataset. There are some restrictions on the dataset that can be selected (see below “ DATASET RULES”). You can focus your report on one aspect of the dataset or multiple aspects, the main objective is to find some interesting questions or problems to answer.

The following criteria will be used when marking your assignment:

•    Identification and description of key challenge(s) or problem(s) to be addressed          10%

Introduction to the dataset                                         10%

•    The challenge(s)/problem(s) is (are) to be addressed using the following

o Summary statistics (including figures) for data being analysed                           20%

o Description, rationale, application and findings from only one unsupervised analysis method covered in the module 20%

o Description, rationale, application and findings from only one supervised analysis method covered in the module 20%

•    Reflection on methods used for analysis                                            10%

•    Structure presentation, and proper citation of references                                 10%

DATASET RULES

Example datasets are available on:

The UCI Machine Learning Repository:https://archive.ics.uci.edu/ml/datasets.php

Kaggle website:https://www.kaggle.com/datasets

You can also select a dataset from other sources, but make sure that the dataset is public and that you have the right to access and analyse the dataset and to share the results.

However, you cannot select a dataset that satisfies one of the rules below. Submitted projects on one of these datasets will receive a mark of zero.

Dataset packaged with Scikit-Learn

Boston house-prices dataset

Iris dataset

Diabetes dataset

Digits dataset

Linneruddataset

Wine dataset

Breast cancer wisconsin dataset

For more information:https://scikit-learn.org/stable/datasets/toy_dataset.html

Datasets packaged with Seaborn

anscombe.csv: Anscombe dataset

attention.csv: Attention dataset

brain_networks.csv: Brain networks dataset

car_crashes.csv: Add 538 car crash dataset

diamonds.csv: Add diamonds dataset

dots.csv: Add dots dataset

exercise.csv: Add exercise dataset

flights.csv: Add flights dataset

fmri.csv: Change sorting of events in fmri data

gammas.csv: Make fake fmri data make a bit more sense

iris.csv: Add iris dataset

mpg.csv: Add mpg dataset

planets.csv: Planets dataset

tips.csv: Tips dataset

titanic.csv: Titanic dataset

For more information:https://github.com/mwaskom/seaborn-data

Datasets that we have seen during lecture/lab sessions.

Part II

AIM OF THE ASSIGNMENT

Self-assessment  activities  aim  to  involve  the  learners  in  evaluating  the  outcome  of  their  work. Students are encouraged to be a realistic judge of their own performance.

LEARNING OUTCOMES

Objectively reflect on and critically evaluate their own progress and skill development; discern how to improve their performance; develop critical reviewing skills.

TEAM

This is an individual work and students are expected to self-assess their work individually. Groupwork for this part of the coursework is not allowed.

SUBMISSION

The self-assessment must be submitted through MyPlace using the self-assessment activity.

DEADLINE

Submission date: 12:00 noon, Monday November 6th, 2023 or one week after the approved date of the extension for the first part.

THE BRIEF

The self-assessment activity allows you to self-assess your report using the "Example Marking scheme" file and provide an expected mark for the coursework with a justification.

•    Expected mark: you are expected to give a final expected mark for the first part on a scale of 1- 100. Then this mark will be used in a specific formula to calculate the first half of your mark for Part II, as explained below. We denote by SM your mark from the self-assessment and by MM the mark you will receive from the marker:

o If SM is greater than or equal to MM, then the following formula will be applied to calculate the mark for this part: (1 - (SM - MM) / (100 - MM)) * 100

o If SM is less than MM, then the following formula will be applied to calculate the mark for this part: (1 - (MM - SM) / MM) * 10

The below table shows a small example on different possibilities, where 0 is the lowest and 50 is the highest mark you can get on this part:                                       50%

MM

SM

Mark

60

100

0

60

92

10

60

68

40

60

60

50

60

54

45

60

48

40

60

12

10

60

0

0

•    Justification: Your justification for the expected mark should be based on the "Example Marking scheme" provided on MyPlace.                  50%