Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

DATA 201  Assignment 1

Total marks: 20

Due date: 11:59pm, Friday 04 August 2022.

Submit code and outputs in a single Jupyter notebook file.

The aim of this assignment is to develop a machine learning model to predict the house prices using information in file data.csv. The description about the data is given in file    description.pdf.

Requirements:

     Use root mean square error (RMSE) as the evaluation metric. [2 marks]

     Load the dataset, determine the target column, remove irrelevant variables (if any),

and use function train_test_split with random_state=0 to split the data into two sets: a training set (80%) and a test set (20%). [3 marks]

     Explore the training set to gain insights. [2 marks]

    Select one machine learning model, train, optimise (e.g., add pre-processing

transformers, perform hyper-parameter tuning, etc.), and estimate the performance of the model. [9 marks]

    Test the final model on the test set, report the RMSE and at least two other

evaluation metrics (e.g., mean absolute percentage error (MAPE), R2-score, etc.) [3 marks]

     Include a discussion at the end of your Notebook (about what you have learnt,

difficulties, what have worked and not worked, future directions, etc.). [1 mark] Notes:

-     Write your name and student ID at the beginning of your notebook. After               completing your work, use menu item Kernel => Restart & Run All in Jupyter, then submit your notebook file.

-     You can use any public Python package.

-     The requirements above have no order that you have to follow.

-     Use your own assumption and judgement if you are unsure about any information in the dataset. However, remember to mention it in discussion.

-     Try to write functions for all data transformations you apply, try feature engineering (e.g., creating new features), and try to automate all the steps as much as possible    (e.g. using Pipeline and data transformers, etc.). You may have bonus marks for this; however, your total mark will not excess 20.