DTS002TC Essentials of Big Data
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
DTS002TC Essentials of Big Data
Coursework Resit (Individual Assessment)
Due: 5 pm China time (UTC+8 Beijing) on Fri. 28th. July, 2023
Weight: 100%
Maximum score: 100 marks ( 100 individual marks)
Assessed learning outcomes:
A. Develop a global perspective on the sources and uses of big data .
B. Engage critically with the technical challenges of data acquisition and management. C. Develop an understanding of the industrial and commercial applications of big data .
D. Demonstrate an awareness of the quantitative problems posed by the analysis of big data .
E. Demonstrate the ability to write codes to obtain numerical solutions to mathematical problems. F. Demonstrate the ability to display computational results in tabulated or graphical forms.
Assessment tasks:
Overview
Price prediction of second-hand car is a typical big data analysis application. Please investigate and study the current application of big data technology in price prediction, to complete the research report. And implement a price prediction application according to the task description.
Submission format instructions
You need to submit two documents ‘car_price.m’ and ‘ID_report.pdf’ via Learning Mall Online to the correct drop box.
. car_price.m : includes the Matlab codes that can be run directly.
. ID_report.pdf : includes the report of big data, Matlab codes, outputs, and figures in your report.
Data introduction
You need to download the raw data set named "car_train.csv" and “car_test.csv ” from LMO. The data set
includes 4 columns of stock data. The meaning of data in each row is as follows.
Column Index |
Value name |
Value type |
Instance |
1 |
Name: The brand of the car |
Text |
Chevrolet Trax |
2 |
Year: production year |
Date |
2018 |
3 |
Miles: mileage driven so far |
Int |
41946 |
4 |
Price: current sales price |
Int |
16990 |
Tasks 1 Big Data Report (60 Marks)
1. Discuss the typical application cases of big data in car market. (do not exceed 300 words) (20 marks)
2. Investigate the relevant research technologies of big data technology in car market and prediction. (do not exceed 300 words) (20 marks)
3. What big data technology is recommended to better predict the trend of second car market and increase the market consumption. (do not exceed 300 words) (20 marks)
Task 2 Car price prediction (40 Marks)
1. Read "car_train.csv" data, and assign each column to a various. (5 marks)
2. Correct the abnormal data to a unified format, such as the year data "20173" needs to be uniformly modified to "2017". (5 marks)
3. Find out the most expensive car in each brand and print the result in the command window. (5 marks)
4. Use the KNN algorithm in the MATLAB algorithm toolbox to train the price prediction model.
(5 marks)
5. Read "car_test.csv" data, and assign each column to a various. (5 marks)
6. Correct the abnormal data to a unified format, such as the year data "201860" needs to be uniformly modified to "2018". (5 marks)
7. Use the trained KNN model to predict the price of each car in the test data set. (5 marks)
8. Save the test data and results to "car_result.csv" file, the data structure is similar to "train.csv".
(5 marks)
Sample Output
>> Acura ILX 2018 32516 23990
...
Chevrolet Volt 2017 63012 24590
...
car_result.csv
Marking Criteria
The following criteria will be used to assess the Coursework Resit assignment.
1 Marking Criteria of Task 1
> Outstanding:
Report format is consistent throughout including heading styles, fonts, and margins,
figure/table/diagram are effectively interpreted and discussed, writing flows smoothly from one idea to another, information is presented in logical and interesting way, all information is located in the appropriate section.
> Appropriate:
Report format is generally consistent, figure/table/diagram are properly interpreted,
sentences are structured and word are chosen to communicate ideas clearly, information is presented in logical manner, information is located in the appropriate section.
> Needs Improvement:
Report format is inconsistent, figure/table/diagram are poorly interpreted and discussed, sentence structure and/or word choice sometimes interfere with clarity, information is hard to follow as there is very little continuity, many items are in the wrong section.
> Hard to Understand:
Report format is inconsistent, figure/table/diagram are not used effectively, sentence structure and word choice make reading and understanding difficult, sequence of
information is difficult to follow, lack of appropriate sections and many items are in the wrong section.
> No submission or Missing Section:
No submission or missing section of the discussion in the report.
2 Marking Criteria of Task 2
> Outstanding:
Correct output, correct variable type usage, good naming rules, good memory control, strong semantic and readability.
> Appropriate:
Correct output, correct variable type usage, good naming rules, poor memory control, poor semantic and readability.
> Needs improvement:
Correct output, good naming rules, wrong variable type usage,poor memory control, poor semantic and readability.
> Hard to understand
Correct output, poor naming rules, wrong variable type usage,poor memory control, poor semantic and readability.
> No submission or missing section
No submission or missing section including code and report
Area |
Basis of marking |
Marks |
Task1.1 |
Be able to give typical cases of financial analysis from the perspective of big data. Clear logical structure and language expression. ·Outstanding: 15 - 20 ·Appropriate: 10 - 14 ·Needs improvement: 6 - 9 ·Hard to understand: 1 - 5 ·No submission or missing section: 0 |
20 |
Task1.2 |
Be able to classify technologies, especially the difference between stock analysis and stock prediction. Clear logical structure and language expression. ·Outstanding: 15 - 20 ·Appropriate: 10 - 14 ·Needs improvement: 6 - 9 ·Hard to understand: 1 - 5 ·No submission or missing section: 0 |
20 |
Task1.3 |
The technical method given comprehensively considers the research contents of 1 and 2, and has strong feasibility. Clear logical structure and language expression. ·Outstanding: 15 - 20 ·Appropriate: 10 - 14 ·Needs improvement: 6 - 9 ·Hard to understand: 1 - 5 ·No submission or missing section: 0 |
20 |
Task2.1 |
Code quality and implementation results ·Outstanding: 5 ·Appropriate: 4 ·Needs improvement: 3 ·Hard to understand: 2 ·No submission or missing section: 0 |
5 |
Task2.2 |
Code quality and implementation results ·Outstanding: 5 ·Appropriate: 4 ·Needs improvement: 3 ·Hard to understand: 2 ·No submission or missing section: 0 |
5 |
Task2.3 |
Code quality and implementation results ·Outstanding: 5 ·Appropriate: 4 ·Needs improvement: 3 ·Hard to understand: 2 ·No submission or missing section: 0 |
5 |
Task2.4 |
Code quality and implementation results ·Outstanding: 5 ·Appropriate: 4 ·Needs improvement: 3 ·Hard to understand: 2 ·No submission or missing section: 0 |
5 |
Task2.5 |
Code quality and implementation results ·Outstanding: 5 ·Appropriate: 4 ·Needs improvement: 3 ·Hard to understand: 2 ·No submission or missing section: 0 |
5 |
Task2.6 |
Code quality and implementation results ·Outstanding: 5 ·Appropriate: 4 ·Needs improvement: 3 ·Hard to understand: 2 ·No submission or missing section: 0 |
5 |
Task2.7 |
Code quality and implementation results ·Outstanding: 5 ·Appropriate: 4 ·Needs improvement: 3 ·Hard to understand: 2 ·No submission or missing section: 0 |
5 |
Task2.8 |
Code quality and implementation results ·Outstanding: 5 ·Appropriate: 4 ·Needs improvement: 3 ·Hard to understand: 2 ·No submission or missing section: 0 |
5 |
overall mark |
|
100 |
2023-07-26