COMP90051 Statistical Machine Learning 2022 Semester 1 – Final Exam
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
COMP90051 Statistical Machine Learning
Final Exam
Semester 1, 2022
Total marks: 120
Students must attempt all questions
Section A: Short Answer Questions [40 marks]
Answer each of the questions in this section as briefly as possible. Expect to answer each question in 1-3 lines.
Question 1: [40 marks]
(a) The Adagrad optimiser is a variant of stochastic gradient descent with a dynamic learning rate .
Explain how the dynamic learning rate operates, and why this is important during optimisation of machine learning models. [5 marks]
(b) Both convolutional networks (CNNs) and recurrent networks (RNNs) can be applied to sequence
inputs. Explain the key benefit that RNNs have over CNNs for this type of input. [5 marks]
(c) Explain in words how Attention can be used to allow for neural models to process dynamic sized inputs. [5 marks]
(d) Can the independence assumptions of all directed probabilistic graphical models be represented by undirected probabilistic graphical models? Why or why not? [5 marks]
(e) Gradient descent is typically preferred over coordinate descent . Given this, why is coordinate ascent
use in the Expectation Maximisation algorithm? [5 marks]
(f) Weak duality guarantees that a primal optimum always upper bounds a dual optimum . How can
we tell that the support vector machine’s primal and dual optima are always equal? [5 marks]
(g) Why are both maximum- likelihood estimators and maximum a posteriori estimators both asymp-
totically efficient? [5 marks]
(h) A model family F’s growth function SF(m) could potentially be 2m, growing exponentially larger
with increasing sample size m. Why would this be bad news for the PAC bound with growth function for F? [5 marks]
Section B: Method & Calculation Questions [60 marks]
In this section you are asked to demonstrate your conceptual understanding of methods that we have studied in this subject, and your ability to perform numeric and mathematical calculations.
Question 2: The Elimination Algorithm [14 marks]
Consider the following directed probabilistic graphical model (PGM) with Boolean-valued random variables
(a) What is the reconstructed graph (aka moral graph) of this PGM, for alphabetical elimination ordering
leaving just H . I.e., A,B,C,D,E,F,G? [8 marks]
(b) What is the maximum clique size in the reconstructed graph? [3 marks]
(c) Write the runtime complexity of the elimination algorithm , eliminating the nodes in the given order, in terms of the maximum clique size. [3 marks]
Question 3: Vapnik-Chervonenkis Dimension [16 marks]
The two parts of this question are related to the VC dimension but are otherwise unrelated and can be answered separately.
(a) Consider an input domain of five points x1 , . . . ,x5 , and binary classifier family F defined by the
table of dichotomies given below. Calculate VC(F) and show a corresponding shattered set of points. [8 marks]
1 |
2 |
3 |
4 |
5 |
0 |
0 |
0 |
0 |
1 |
1 |
0 |
1 |
0 |
1 |
0 |
1 |
0 |
1 |
0 |
0 |
1 |
1 |
0 |
1 |
0 |
0 |
0 |
1 |
0 |
1 |
0 |
1 |
1 |
1 |
1 |
1 |
0 |
0 |
1 |
0 |
1 |
1 |
1 |
1 |
(b) For any t ∈ [0, 1], define the function It on x ∈ [0, 1] to be
It (x) =
What is the Vapnik- Chervonenkis dimension of the family F = {It : t ∈ [0, 1]} and why? [4 marks]
(c) Referring to the functions defined in the previous part, what is the Vapnik- Chervonenkis dimension of family F\ = F ∪ {1 − It : t ∈ [0, 1]} and why? (Hint: function 1 − It outputs 0 on inputs x ≥ t, and 1 otherwise .) [4 marks]
Question 4: Support Vector Machines [16 marks]
Assume that the following figure is an illustration of the decision boundary learned by a linear soft-margin SVM on a two-class data set, with points from one class labelled with blue circles and points from the other class labelled with red squares.
The Lagrangian function of the soft-margin SVM is:
L(w,b,λ,β,ξ) = + C 工(n)ξi +工(n) λigi(w,b,ξ) +工(n)βi(−ξi)
i=1 i=1 i=1
where
gi(w,b,ξ) = 1 − ξi− y(i)(wT x(i) + b)
and f(x) = wT x + b. Here λi , βi are Lagrange multipliers and ξi is the slack variable of the ith data point, C is the slack penalty.
Given the seven points A, B, C, D, E, F, and G as shown in the figure,
(a) Identify the training point(s) with λi = 0 (by index A–G). [4 marks]
(b) Identify the training point(s) with ξi 0 (by index A–G). [6 marks]
(c) For each training point identified in the previous part, is ξi < 0, 0 < ξi < 1, 1 < ξi < 2, or is ξi > 2? [6 marks]
Question 5: Convolutional Neural Networks [14 marks]
Consider a simple Convolutional Neural Network for processing 3 × 3 images, with a single filter of size 2 × 2, applied with no padding and stride of 1 × 1. The next step is global max-pooling, which takes the filter output (size 2 × 2) to produce a single output.
1. What mathematical function is defined by the network, relating its output y to its input x1,1 ,x1,2 , . . . ,x3,3 ? You will need to introduce new variables for the model’s parameters. [6 marks]
2. Explain how a loss gradient at the output is back-propagated through the network, and show the mathematical form of the loss gradient with respect to the model parameters. [8 marks]
Section C: Design and Application Questions [20 marks]
In this section you are asked to demonstrate that you have gained a high-level understanding of the methods and algorithms covered in this subject, and can apply that understanding. Answers should be about 1 page in length for the question.
Question 6: Online Content Testing [20 marks]
Your task is to design an automatic content recommendation system for a major newspaper “TechNews” to use to select which headlines to use for articles on its online front page, and which pictures to display for these front page article previews. TechNews has very many users, and serves very many different articles across different categories (like local news, international news, opinion pieces, investment, etc.). TechNews wants to encourage user engagement with their platform, so that users keep returning to the news service (driving more subscription and ad revenue). TechNews management therefore will judge the performance of its content recommendation system by cumulative clicks . Complicating matters, different kinds of users typically click on different kinds of articles according to their tastes in article categories, types of headlines, and picture previews, etc. Your goal is to recommend headlines and pictures to display to maximise the total number of clicks, where each article comes with a number of different available headlines and picture previews to choose from.
(a) Formulate the content recommendation task as a contextual multi- armed bandit problem . Explain,
what your arms are, what rounds correspond to, and what are your rewards . [6 marks]
(b) What context vectors will you use based on what data TechNews might possess? [6 marks]
(c) Comment on whether the typical bandit assumption of independent and identically distributed re- wards per arm is appropriate. Does context make this assumption more or less realistic? [4 marks]
(d) Describe what data TechNews should collect, so that you could efficiently evaluate the effectiveness of a huge number of bandit learners. [4 marks]
2022-11-29