Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

January 2019

COMP361101

Machine Learning

Useful Formulas

    Entropy of a set S with elements from C classes: Η(S) =  i(C)=1 −pi   log pi .     Gini impurity of a set S with elements from C classes: G(S) = 1 −  ∑i(C)=1 pi(2)

    Gini/Information gain: G(S, F) =  M(S) −  values(F)  M(Sf ), where M is

either the Gini impurity or the Entropy.

    Pseudoinverse of matrix Φ: Φp  = (ΦT Φ)−1ΦT

    Sarsa update: Qk+1(s, a) = Qk (s, a) +  a(Rt+1  +  y Q(s, a ) − Qk (s, a)).

    Q-learning update: Qk+1(s, a) = Qk (s, a) +  a(Rt+1  +  maxa′  y Qk (s, a )

Qk (s, a)) .

Question 1

a.  Are the two classes depicted below linearly separable in two dimensions? Explain why.

 [2 marks]

b.   Given the dataset: {<1,1,0>, <1,2,0>, <2,1,1>, <4,3,1>}, where the last element of each point is the class, classify the point <2,2> using the algorithm 3-nearest        neighbour. Justify your answer. [3 marks]

c.   Given the dataset: {<1,1>, <2, 1>, <3,3>, <4,3>, <5,3>}, compute the new position of the cluster centres <1,0> and <4,4> after one step of k-means. Justify your answer. [5 marks]

d.   Derive the update rule of the output neurons of a multi-layer perceptron according to the algorithm Backpropagation of Errors. [5 marks]

e.  A Support Vector Machine solves the following optimisation problem:

min‖w‖2

s.t. tn(wT xn) ≥ 1   ∀xn  X

Explain what is achieved by adding a set of variables 3nto the formulation as follows, and why they need to appear on both the constraints and the objective function:

minw2  + c 3n

s.t. tn(wT xn) ≥ 1 −  3n    xn  [5 marks] [total: 20 marks]

Question 2

a.   Consider the data set S = {<T, T, 0>, <F, T, 0>, <T, F, 1>, <T, F, 0>, <F,F,1>, <F,F,0>},  where each data point has two binary features A and B (whose values are either true (T) or false (F)), and the third value is the class. What is the first feature that the CART algorithm would split on? Justify your answer. [4 marks]

b.   Given the following dataset: {<- 1,2>, <0,1>, <1,2>, <2,5>} , where the first component of each vector is the input, and the second component the corresponding desired output,   compute the least-squares solution to the linear regression problem for the function:

y = w0  + w1x [8 marks]

c.  Given the following dataset: {<0,-2,1>, <1,0,1>, <- 1,3,1>, <2,3,0>, <- 1,2,0>}, where the last element of each vector is its class, construct a multi-layer perceptron that classifies the dataset, and draw its diagram (nodes and edges with corresponding weights) . [8 marks] [total: 20 marks]

Question 3

a.   Describe the difference between an on-policy and off-policy reinforcement learning   algorithm. Provide an example of each method, and an MDP where they converge to different value functions. [5 marks]

b.   Describe the difference between Monte Carlo and Temporal Difference algorithms. What could be one reason to prefer Temporal Difference methods? [3 marks]

c.   Compute the state-action value functions obtained by Sarsa and Q-learning for the MDP in the following figure, under an ε-greedy policy with ε = 0.2. The edges of the graph are actions, labelled with their name, probability, and immediate reward when non-zero. The nodes are states, labelled with their name. For this MDP, γ=0.5.

a,0.5

3

4

a,1,- 10

[12 marks] [total: 20 marks]