Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

DTS002TC Essentials of Big Data

Coursework Resit (Individual Assessment)

Due: 5 pm China time (UTC+8 Beijing) on Fri. 28th. July, 2023

Weight: 100%

Maximum score: 100 marks ( 100 individual marks)

Assessed learning outcomes:

A. Develop a global perspective on the sources and uses of big data .

B. Engage critically with the technical challenges of data acquisition and management.  C. Develop an understanding of the industrial and commercial applications of big data .

D. Demonstrate an awareness of the quantitative problems posed by the analysis of big data .

E. Demonstrate the ability to write codes to obtain numerical solutions to mathematical problems. F. Demonstrate the ability to display computational results in tabulated or graphical forms.

Assessment tasks:

Overview

Price prediction of second-hand car is a typical big data analysis application. Please investigate and study the current application of big data technology in price prediction, to complete the research report. And implement a price prediction application according to the task description.

Submission format instructions

You need to submit two documents ‘car_price.m’ and ‘ID_report.pdf’ via Learning Mall Online to the correct drop box.

. car_price.m : includes the Matlab codes that can be run directly.

. ID_report.pdf : includes the report of big data, Matlab codes, outputs, and figures in your report.

Data introduction

You need to download the raw data set named "car_train.csv" and car_test.csv from LMO. The data set

includes 4 columns of stock data. The meaning of data in each row is as follows.

Column Index

Value name

Value type

Instance

1

Name: The brand of the car

Text

Chevrolet Trax

2

Year: production year

Date

2018

3

Miles: mileage driven so far

Int

41946

4

Price: current sales price

Int

16990

Tasks 1 Big Data Report (60 Marks)

1. Discuss the typical application cases of big data in car market. (do not exceed 300 words) (20 marks)

2. Investigate  the   relevant research  technologies   of  big data  technology   in car market and prediction. (do not exceed 300 words) (20 marks)

3. What big data technology is recommended to better predict the trend of second car market and increase the market consumption. (do not exceed 300 words) (20 marks)

Task 2 Car price prediction (40 Marks)

1. Read "car_train.csv" data, and assign each column to a various. (5 marks)

2. Correct the  abnormal  data  to  a  unified  format,  such  as the year  data  "20173"  needs  to  be uniformly modified to "2017". (5 marks)

3. Find out the most expensive car in each brand and print the result in the command window. (5 marks)

4. Use the KNN algorithm in the MATLAB algorithm toolbox to train the price prediction model.

(5 marks)

5. Read "car_test.csv" data, and assign each column to a various. (5 marks)

6. Correct the abnormal data to a unified format, such as the year data  "201860" needs to be uniformly modified to "2018". (5 marks)

7. Use the trained KNN model to predict the price of each car in the test data set. (5 marks)

8. Save the test data and results to "car_result.csv" file, the data structure is similar to "train.csv".

(5 marks)

Sample Output


>> Acura ILX  2018   32516 23990

...

Chevrolet Volt 2017   63012 24590

...

car_result.csv

Marking Criteria

The following criteria will be used to assess the Coursework Resit assignment.

1 Marking Criteria of Task 1

> Outstanding:

Report format is consistent throughout including heading styles, fonts, and margins,

figure/table/diagram are effectively interpreted and discussed, writing flows smoothly from one idea to another, information is presented in logical and interesting way, all information is located in the appropriate section.

> Appropriate:


Report format is generally consistent, figure/table/diagram are properly interpreted,

sentences are structured and word are chosen to communicate ideas clearly, information is presented in logical manner, information is located in the appropriate section.

> Needs Improvement:

Report format is inconsistent, figure/table/diagram are poorly interpreted and discussed,    sentence structure and/or word choice sometimes interfere with clarity, information is hard to follow as there is very little continuity, many items are in the wrong section.

> Hard to Understand:

Report format is inconsistent, figure/table/diagram are not used effectively, sentence structure and word choice make reading and understanding difficult, sequence of

information is difficult to follow, lack of appropriate sections and many items are in the wrong section.

> No submission or Missing Section:

No submission or missing section of the discussion in the report.

2 Marking Criteria of Task 2

> Outstanding:

Correct output, correct variable type usage, good naming rules, good memory control, strong semantic and readability.

> Appropriate:

Correct output, correct variable type usage, good naming rules, poor memory control, poor semantic and readability.

> Needs improvement:

Correct output, good naming rules, wrong variable type usage,poor memory control, poor semantic and readability.

> Hard to understand

Correct output, poor naming rules, wrong variable type usage,poor memory control, poor semantic and readability.

> No submission or missing section

No submission or missing section including code and report

Area

Basis of marking

Marks

Task1.1

Be able to give typical cases of financial analysis from the perspective of big data. Clear logical structure and language expression.

·Outstanding: 15 - 20

·Appropriate: 10 - 14

·Needs improvement: 6 - 9

·Hard to understand: 1 - 5

·No submission or missing section: 0

20

Task1.2

Be able to classify technologies, especially the difference between stock analysis and stock prediction. Clear logical structure and language expression.

·Outstanding: 15 - 20

·Appropriate: 10 - 14

·Needs improvement: 6 - 9

·Hard to understand: 1 - 5

·No submission or missing section: 0

20

Task1.3

The technical method given comprehensively considers the research contents of 1 and 2, and has strong feasibility.

Clear logical structure and language expression.

·Outstanding: 15 - 20

·Appropriate: 10 - 14

·Needs improvement: 6 - 9

·Hard to understand: 1 - 5

·No submission or missing section: 0

20

Task2.1

Code quality and implementation results

·Outstanding: 5

·Appropriate: 4

·Needs improvement: 3

·Hard to understand: 2

·No submission or missing section: 0

5

Task2.2

Code quality and implementation results

·Outstanding: 5

·Appropriate: 4

·Needs improvement: 3

·Hard to understand: 2

·No submission or missing section: 0

5

Task2.3

Code quality and implementation results

·Outstanding: 5

·Appropriate: 4

·Needs improvement: 3

·Hard to understand: 2

·No submission or missing section: 0

5

Task2.4

Code quality and implementation results

·Outstanding: 5

·Appropriate: 4

·Needs improvement: 3

·Hard to understand: 2

·No submission or missing section: 0

5

Task2.5

Code quality and implementation results

·Outstanding: 5

·Appropriate: 4

·Needs improvement: 3

·Hard to understand: 2

·No submission or missing section: 0

5

Task2.6

Code quality and implementation results

·Outstanding: 5

·Appropriate: 4

·Needs improvement: 3

·Hard to understand: 2

·No submission or missing section: 0

5

Task2.7

Code quality and implementation results

·Outstanding: 5

·Appropriate: 4

·Needs improvement: 3

·Hard to understand: 2

·No submission or missing section: 0

5

Task2.8

Code quality and implementation results

·Outstanding: 5

·Appropriate: 4

·Needs improvement: 3

·Hard to understand: 2

·No submission or missing section: 0

5

overall mark

100