Lab 5

(Deadline 18:00 10/01/2020)

Task

You will be provided with a machine learning benchmark dataset (details below). The task focuses on the implementation and critical analysis of multiple regression methods: You should implement, evaluate, and analyse each of the following algorithms:

● Nearest neighbour

● Linear regression

● Regression forest

● Gaussian process

The results should be presented as a report with a 3000 word limit in a pdf document. Please also include your code as well (Jupyter workbook and/or normal .py files).

Mark scheme

This project is worth 60% of your marks for the unit:

● 10 marks for the method,

● 20 marks for designing and validating on a toy problem (see below),

● 20 marks for the experiments,

● 10 marks for the analysis,

for a total of 60 marks.

If a piece of work is submitted after the submission date (and no extension has been explicitly granted by the Director of Studies), the maximum possible mark will be 40% of the full mark. If work is submitted more than five working days after the submission date, the student will receive zero marks.

Data set

Included on Moodle is the SARCOS data set, a regression problem where the task is to predict the torque of one motor of a robotic arm given physical joint details. Specifically, position, velocity and acceleration for 7 degrees of freedom.

The data set is provided as one csv file - you are responsible for splitting it for train/test and hyperparameter learning. Each row is an exemplar, and each column a feature. Your task is to predict the last column (#22) given the first 21 columns. Be aware that you will not want to use all of the exemplars when training a Gaussian process as it will get too slow.