闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Data Modelling and Analysis

COMP4030

Coursework 2022 CW2 Brief

Assessment Name	Coursework 2 – Data Analysis Study	Weight	75%
Description and Deliverable(s)	This assignment requires you to work in a pair. You will need to analyse a data set using all the data science steps you have learnt to create and compare classification models. You will write your work up as a joint academic paper with a coursework partner, comparing and analysing your results at every stage of the data analysis and modelling pathway (6 to 8 pages including references and diagrams) as stated in this coursework specification. The paper should be submitted in PDF, using the IEEE template for formatting. The code should be submitted as R script.
Release Date	Tuesday 1st March 2022
Submission Date	Monday 9th May 2022 by 3pm
Late Policy (University of Nottingham default will apply, if blank)	Work submitted after the deadline will be subject to a penalty of 5 marks (the standard 5% absolute) for each late working day out of the total 100 marks. Late submission deadline is Friday 13 May 2022. Submissions after this date will only be accepted through the extenuating circumstances process.
Feedback Mechanism and Date	Written feedback in Moodle on the 6th of June 2022

Instructions

For this coursework assignment you will need be required to work in pairs to analyse a data set (select one from the three provided or find one of your own choice) using all the data science steps you have learnt to create and compare classification models.

You will write your work up as a joint academic paper with your coursework partner, comparing and analysing your results at every stage of the data analysis and modelling pathway .

You will need to present your paper in an IEEE format using a template from here:

https://www.ieee.org/conferences/publishing/templates.html

Your paper should be between 6 to 8 pages (including tables, diagrams and references as appropriate) and submitted as a PDF . The diagrams table and diagrams should add value to the writing. Diagrams are preferrable to tables.

Your paper should be organised into 8 parts:

1. Title and Abstract (2.5%)

2. Introduction to the data set and research question(s) (5%)

3. Literature Review – covering a few key methods adopted by other researchers who used this or a similar dataset (5%)

4. Methodology – including a justification for your selected approaches for data analysis and pre-processing and data classification. (10%)

5. Results from each of the stages – data analysis, pre-processing and classification (20%) Please note at each partner in the pair should use a different approach for each stage.

6. Discussion - comparing your results (partners in pair) and also with other results from previous research on the dataset as noted in your literature review (25%)

7. Conclusions and recommendation for future research (10%)

8. References (2.5%)

Code Submission

Please include all your code as an R script which the be run to generate your results (20% = each person in the pair will be marked individually on this) as a separate file in additional to the paper.

The ultimate aim of this coursework is to give you first-hand experience on working with a relatively large and real data set, getting experience of the first stages of data description, exploratory data analysis to the later stages of knowledge extraction and classification/prediction.

Please note that you need to include a contributions section in the paper to clearly specify which person worked on what aspects of the paper.

Datasets

You can choose to work on one of the following datasets:

1. Wine Data Set

https://search.r-project.org/CRAN/refmans/HDclassif/html/wine.html

Data Set Information:

These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines.

Format: A data frame with 178 observations on the following 14 variables:

Class The class vector, the three different cultivars of wine are represented by the three integers : 1 to 3.

V1 Alcohol

V2 Malic acid

V3 Ash

V4 Alkalinity of ash

V5 Magnesium

V6 Total phenols

V7 Flavanoids

V8 Nonflavanoid phenols

V9 Proanthocyanins

V10 Color intensity

V11 Hue

V12 OD280/OD315 of diluted wines

V13 Proline

2. Breast Cancer Wisconsin (Diagnostic) Data Set

https://search.r-project.org/CRAN/refmans/mlbench/html/BreastCancer.html

Data Set Information:

The objective is to identify each of a number of benign or malignant classes. Samples arrive periodically as Dr. Wolberg reports his clinical cases. The database therefore reflects this chronological grouping of the data. This grouping information appears immediately below, having been removed from the data itself. Each variable except for the first was converted into 11 primitive numerical attributes with values ranging from 0 through 10. There are 16 missing attribute values. See cited below for more details.

Format A data frame with 699 observations on 11 variables, one being a character variable, 9 being ordered or nominal, and 1 target class.

[,1]	Id	Sample code number
[,2]	Cl.thickness	Clump Thickness
[,3]	Cell.size	Uniformity of Cell Size
[,4]	Cell.shape	Uniformity of Cell Shape
[,5]	Marg.adhesion	Marginal Adhesion
[,6]	Epith.c.size	Single Epithelial Cell Size
[,7]	Bare.nuclei	Bare Nuclei
[,8]	Bl.cromatin	Bland Chromatin
[,9]	Normal.nucleoli	Normal Nucleoli
[,10]	Mitoses	Mitoses
[,11]	Class	Class

3. Pima Indians Diabetes Dataset

https://search.r-project.org/CRAN/refmans/hhcartr/html/pima.html

Description

This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.