ACS61013 Coursework 1
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
ACS61013
Coursework 1
2022
Assignment due date: Hand in by 11pm on the 11th of November; this course work makes up 25% of your total module mark. Submit your report on Blackboard as a pdf file on Blackboard. Also, include your Orange file (.ows) and your MATLAB codes as part of your submission.
Unfair Means: The assignment should be completed individually. You should not discuss the assignment with other students and should not work together in completing the assignment. The assignment must be wholly your own work. Any suspicions of the use of unfair means will be investigated and may lead to penalties. See
http://www.shef.ac.uk/ssid/exams/plagiarismfor more information.
Penalties for Late Submission: Late submissions will incur the usual penalties of a 5% reduction in the mark for every working day (or part thereof) that the assignment is late and a mark of zero for submission more than 5 working days late.
Extenuating Circumstances: If you have any extenuating circumstances (medical or special circumstances) that might have affected your performance on the assignment, please follow the guidance athttps://www.sheffield.ac.uk/ssid/forms/circs
Help: This assignment briefing and the lecture notes provide all the information that is required to complete this assignment. It is not expected that you should need to ask further questions. However, if you need clarifications on the assignment then please discuss the issue with me after a lab, or email mej.oyekan@sheffield.ac.uk.
Specific assignment information and instructions
The challenge: You have been provided a data set made up of diamond features and their target price. The data set contains the prices and other features of almost 54,000 diamonds.
In addition to your domain analysis, see the Appendix below to understand the dataset provided to you. Your task is to develop various machine-learning models as presented in the table below.
Tools to use: Majority of the MATLAB code you need to complete the assignment are available from various lab sessions. If you are comfortable using Python, you are free to use it. You are also free to use Orange for various aspects of the coursework as required.
Tasks and Mark Scheme: The aim of this coursework is to design, implement and evaluate an effective machine-learning pipeline for predicting diamond prices. The specific tasks and corresponding mark scheme are given in the table below. It is up to you how you approach this problem, design a solution and write-up your results. For each task, the mark within the grade boundary will be based on your description in your report and results.
Task/Assessment Description |
Mark Range |
Level of achievement |
Task 1: Conduct a domain analysis and present your findings as related to the domain of the coursework. Discuss how what you have learnt from your domain analysis and how that will support other parts of your coursework. |
0-15% |
1 |
Task 2: Achieve level 1 as well as conduct data cleaning and pre-processing. Discuss how you used your understanding of the domain from level 1 to support this task. |
15-30% |
2 |
Task 3: Achieve the previous levels plus discuss the steps taken in feature engineering and preventing bias in the dataset to be used to training the machine learning algorithms. Answer the following questions: Which data features are more correlated to each other and explain why you think they are. Which three variables closely correlate with the target price column and using your knowledge of the domain (Hint: Use your domain analysis as well), explain why. |
30-45% |
3 |
Task 4: Achieve all the previous levels as well as: Apply a regression machine learning methodology to predict the diamond price. Apply a decision tree methodology to predict the diamond price based on three class prices: LOW, MEDIUM and HIGH. |
45-65% |
4 |
Task 5: Achieve all the previous levels plus discuss how you applied cross validation techniques in the machine learning pipeline. |
65-80% |
5 |
Task 6: Achieve all the previous levels as well as discuss how effective your pipeline is at preventing overfitting and underfitting through the application of learning curves and classification evaluation metrics as appropriate. |
80-100% |
6 |
Technical Report and code
Write your results in no more than a 15 page technical report. Make sure your report has a table of content, sections, discussion and conclusions.
You must create a MATLAB (or Python code) and an Orange pipeline design for your solution(s). Support your report with an Orange pipeline design and MATLAB code. Make sure you provide comments in your MATLAB code as well as instructions on how to run it. Hand in your report (.pdf), software (Orange and MATLAB) via Blackboard by 11pm on the 11th of November 2022. This course work makes up 25% of your total module mark.
Appendix
Features |
Description |
ID |
An identifier number |
Price |
Price in US dollars (\$326--\$18,823) |
carat |
Weight of the diamond (0.2--5.01) |
cut |
Quality of the cut (Fair, Good, Very Good, Premium, Ideal) |
color |
Diamond colour, from J (worst) to D (best) |
clarity |
A measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best)) |
x |
Length in mm (0--10.74) |
y |
Width in mm (0--58.9) |
z |
Depth in mm (0--31.8) |
2022-11-05