GENERAL INSTRUCTIONS

1. The assignment 2 consists of 3 tasks, which together are worth 100 marks. The figure in [] denotes the number of marks available for that task or part of the task.

2. The programs can be written in Matlab or Python in a professional style (e.g., appropriate comments, indents, meaningful variable names).

3. Please make sure that your programs are executable and your PDF report is produced by computer.

4. No plagiarism, which will be strictly checked by Turnitin.

5. You are supposed to finish all the tasks.

6. Partial marks may be awarded depending on the degree of completeness and clarity of your answers.

Task 1 Classification [35 Marks]

1. Using the MNIST database available at http://yann.lecun.com/exdb/mnist/, select two  classification algorithms and implement them to achieve a high accuracy (more than 90%). [15 marks]

2. Describe the techniques, including data preparation, feature reduction, and training tricks in your classification algorithms. [10 marks]

3. Analyse some other techniques that can be applied in your classification algorithms to improve your model’s performance such as accuracy, efficiency, and storage. [10 marks]

Task 2 Support vector machine (SVM) and principal component analysis (PCA) [40 Marks]

1. Using the iris.data available on the ICE, select the training dataset and validation dataset, and implement the SVM algorithm (based on public packages or libraries) to classify the types of iris (achieving an accuracy of 90%). [10 marks]

2. Using the same iris.data, reduce the dimension of features applying the PCA and extract the first, second, and third principal components. [10 marks]

3. Using the extracted first, second, and third principal component in Task 2.2, respectively, to train a SVM model to classify the types of iris and compare their accuracies. [10 marks]

4. For each combination of the extracted first, second, and third principal components, train a SVM model to classify the types of iris, and then compare their accuracies. [10 marks]

Task 3 Clustering [25 Marks]

1. Specify the number of clustering (e.g., 1, 2, 3, 4, 5, and 6) and implement the k-means algorithm (based on public packages or libraries) to classify the iris.data. [10 marks]

2. Apply the PCA to reduce the dimension of features and combine the first, second, and third principal components to implement the k-means algorithm (based on public packages or libraries) to classify the iris.data. [15 marks]