Cardiff School of Computer Science and Informatics
Cardiff School of Computer Science and Informatics
Coursework Assessment Pro-forma
Module Code: CMT311
Module Title: Principles of Machine Learning
Lecturer: Dr Xianfang Sun
Assessment Title: Coursework 2
Assessment Number: 2/2
Date Set: 18 January 2021
Submission Date and Time: 5 February 2021 at 9:30am
Return Date: 26 February 2021
This assignment is worth 50% of the total marks available for this module. If coursework is submitted late (and where there are no extenuating circumstances):
1. If the assessment is submitted no later than 24 hours after the deadline, the mark for the assessment will be capped at the minimum pass mark;
2. If the assessment is submitted more than 24 hours after the deadline, a mark of 0 will be given for the assessment.
Your submission must include the official Coursework Submission Cover sheet, which can be found here:
https://docs.cs.cf.ac.uk/downloads/coursework/Coversheet.pdf
Submission Instructions
Complete the questions as described in the Assignment section, and upload the files listed in the following table onto Learning Central in the CMT311 module Assessment Coursework 2 section by 5 February 2021 at 9:30am.
Description |
Type
|
Name
|
|
Cover sheet
|
Compulsory
|
One PDF (.pdf) file
|
[student number].pdf
|
Answers to all questions
|
Compulsory
|
One PDF (.pdf) file
|
CMT311_2_[student number].zip
|
Any deviation from the submission instructions above (including the number and types of files submitted) will lead to the marks being capped at 50%.
Staff reserve the right to invite students to a meeting to discuss coursework submissions
Assignment
Answer all the five questions below. The number of marks available for each question (part) is indicated in brackets [-]. There are 100 marks available in total for this assignment.
Question 1. Linear Regression
The ERM problem of linear regression with respect to the loss function of maximum absolute error can be cast as a linear program. Show mathematical derivation of writing this problem, namely,
as a linear program. [15 marks]
Question 2. Support Vector Machine
Suppose we are using a linear SVM on the dataset showing below:
For each of the following cases, draw the decision boundary of linear SVM, indicate the support vectors, and justify your answers in a few sentences:
a) The whole dataset is used as the training set, [5 marks]
b) The data ((2, 2), ‘+’) is excluded from the whole training set, [5 marks]
c) The data ((4,3), ‘-’) is excluded from the whole training set. [5 marks]
Question 3. Decision Tree
Consider the following training set, where
((1 1 1 1), 1)
((0 1 0 1), 1)
((0 1 1 0), 1)
((1 1 0 0), 1)
((0 1 1 1), 0)
((1 0 1 0), 0)
((1 0 0 1), 0)
((0 0 0 0), 0)
a) Suppose we run the ID3 algorithm up to depth 3. Assume that the subroutine used to measure the quality of each feature is based on information gain, and that if two features get the same score, one of them is picked arbitrarily. Show that the training error of the resulting decision tree is at least 1⁄8. [8 marks]
b) Find a decision tree of depth 3 that attains zero training error. [7 marks]
Question 4. Neural Networks
Deep learning is a machine learning method based on deep neural networks. Describe three popular techniques that are used in deep learning but not in traditional neural network learning and explain why they can improve the learning performance from the traditional neural network learning techniques. [15 marks]
Question 5. Case Study
Suppose you are required to develop a software system to predict stock market and to provide operation instructions for investors to maximize their stock gains. The software should have the following functions:
1) It can apply all the stock market information (quantitative (numerical values) and qualitative (text or symbolic information)).
2) It can provide the investors guidance about when and how much they should buy or sell some shares to maximise their gains within a limited period, given the total amount of available fund.
Select one machine learning model from linear predictor, support vector machine, decision tree, and neural network, and justify your selection by comparing the advantages and disadvantages of different models. Describe how to combine the selected model and the stock market information to generate operation instructions. (The answer to this question should take less than one page of text, approximately 500 words.) [40 marks]
Learning Outcomes Assessed
1. Describe basic principles underlying machine learning.
2. Assess the key concepts and algorithms widely used in machine learning.
3. Apply basic algorithms to toy examples.
4. Critically reflect and evaluate different approaches for learning.
5. Determine a suitable machine learning approach given an application.
Criteria for assessment
Credit will be awarded against the following criteria.
● Correctness of technical answers. Do the answers correctly address the requirements of each task? [30%]
● Appropriate use of the concepts covered in class. Do the answers show an understanding of the basic concepts? [30%]
● Quality, clarity and conciseness of justifications [40%]
Distinction (70-100%): excellent understanding of all relevant concepts; no technical mistakes; clear and concise explanations and justification. Merit (60-69%): good understanding of all relevant concepts; no technical mistakes; explanations address key points, but are missing detail. Pass (50-59%): sufficient understanding of all relevant concepts; no or only minor technical errors; limited explanations. Fail (0-49%): lack of understanding of all relevant concepts; major errors; missing or unclear explanations.
Feedback and suggestion for future learning
Feedback on your coursework will address the above criteria. Feedback and marks will be returned on 26 February 2021 via Learning Central. This will be supplemented with oral feedback on request.
2021-01-26