G53MLE-E1 Question 1: Foundations of Machine Learning Autumn 2016
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
G53MLE-E1
Question 1: Foundations of Machine Learning
(a) (i) give the formula for a multivariate linear regressor with a 3rd degree polynomial and 2
variables. (ii) See the data in the table below of paired values x1 and x2. If the goal is to predict x2 from x1, what type of linear regressor would be used here? Give its general
formula (you don’t need to find the actual values of any weights or parameters). (iii) Give the formula for the generalised linear basis function. (iv) Why would you want to introduce non-linear basis functions in linear regression?
x1 |
x2 |
0 |
2 |
5 |
29.5 |
10 |
107 |
15 |
234.5 |
20 |
412 |
25 |
639.5 |
30 |
917 |
(8 marks)
(b) Consider a Linear Discriminant Analysis classifier with two parameters to learn a weight for –
one weight per parameter. Two different ways of learning the parameters of this algorithm are brute force search and gradient descent. Describe how each works by (i) giving pseudocode and the formula for the update of gradient descent and (ii) pseudocode for brute force search, clearly addressing the range of values tested. (iii) Explain for both gradient descent and brute force search what the biggest drawback is in using them. (iv) Include a sketch illustrating how gradient descent works for a quadratic error function. Include in your sketch a visualisation of the termination criterion.
(14 marks)
(c) Explain why in Data Mining one is never able to rely purely on machine-measurable objective functions. Use in your explanation the criteria for interesting patterns.
(6 marks)
(a) (i) Draw a set of 2-dimensional linearly separable data points with binary labels, and clearly indicate the decision boundary found by a maximum-margin classifier, and at least one decision boundary that does not obtain a maximum margin but still represents a perfect classifier. (ii) Explain how it’s possible to obtain such a non-maximum margin solution, using the concept of a loss function.
(6 marks)
Question 2: Artificial Neural Networks and Deep Learning [overall 33 marks]
Below is given a training set of eruption duration and time to next eruption for two type of geyser eruptions of The Old Faithful. This will be used for question 2a.
X1: Eruption duration (min) |
X2: Time to next eruption (min) |
Y: Eruption Type |
3.6 |
79 |
1 |
1.8 |
54 |
2 |
3.3 |
74 |
1 |
2.3 |
62 |
3 |
4.5 |
85 |
1 |
2.9 |
55 |
2 |
4.7 |
88 |
1 |
3.6 |
85 |
1 |
2.0 |
51 |
3 |
4.4 |
85 |
3 |
1.8 |
54 |
2 |
(a) Draw a diagram of an ANN’s topology that can learn this pattern based on the given data, using a single hidden layer with three steerable units plus any additional necessary nodes. Name all relevant elements using indices that indicate source and target layer numbers, and ensure you account for biases. Initialise all weights to 1.
(6 marks)
(b) Design a Deep-Learning architecture using Convolutional Neural Networks as one of a number of components for the task of classifying images into images that contain cats and images that don’t. Name the different layers/components of the network; explain what function they have, and provide the cost function used in the output layer to inform back-propagation. Use at least one convolutional layer, one dimensionality reduction layer, and any other layers you deem necessary.
(12 marks)
(c) Explain how the ReLU revolutionised Deep Learning, by relating it to the concept of the vanishing gradient. Sketch the activation function of the ReLU and compare this with at least one other activation function to illustrate your explanation.
(11 marks)
(d) Consider a CNN that applies a single channel convolutional layer with a 3x3 kernel to a 10x10 input image, followed by a MaxPool layer. How many weighted connections come out of this network?
(4 marks)
2022-01-15