Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

January 2018

COMP361101

Machine Learning

Useful Formulas

Sigmoid function: a(x) = .

Entropy of a set S with elements from C classes: Η(S) = i(C)=1 −pi  log pi . Gini impurity of a set S with elements from C classes: G(S) = 1 − ∑i(C)=1 pi(2)

Information gain: G(S, F) = Η(S) − f values(F) Η(Sf).

Classification with dual SVM formulation: y(x) = n(N)=1 入n tnxTxn + w0

Sarsa update: Qk+1(s, a) = Qk (s, a) + a(Rt+1 + y Q(s, a ) − Qk (s, a)).

Q-learning update: Qk+1(s, a) = Qk (s, a) + a(Rt+1 + maxay Q(s, a ) − Qk (s, a)).

Question 1

a.   Describe the difference between generative and discriminative classification methods. [2 marks]

b.   Given the dataset: {<1,1,0>, <3,3,0>, <3,1,1>, <1,3,1>}, where the last element of each point is the class, classify the point <0,0> using the algorithm nearest neighbour. Justify your answer. [3 marks]

c.   Given the dataset: {<0,1>, <0,2>, <1,1>, <1,2>, <4,0>, <4,- 1>}, compute the new position of the cluster centres <0,0> and <5,0> after one step of k-means. Justify your answer. [5 marks]

d.   Derive the update rule of the output neurons of a multi-layer perceptron according to the algorithm Backpropagation of Errors. [5 marks]

e.   Derive the update rule of the inner neurons of a multi-layer perceptron according to the algorithm Backpropagation of Errors. [5 marks] [total: 20 marks]

Question 2

b.  A neuron has the activation function:

y = g (wixi ) .

Which one of the following two functions can be used as the function g( ) for the neurons of a feed-forward neural network, so that the weights can be learned with the algorithm backpropagation of errors? Justify your choice.

−1 if x < 0

1 if x ≥ 0

f2 (x) = x [2 marks]

c.   Comment on the ability of the following Support Vector Machines to classify the  dataset in the figure: hard-margin SVM, soft-margin SVM, hard-margin SVM with kernel functions.

[3 marks]

c.   Consider the data set S = {, , , , ,

, }, where each data point has two binary features F1 and F2  (whose values are either true (T) or false (F)), and the third value is the class. What is the  first feature that the ID3 algorithm would split on? Justify your answer. [5 marks]

d.   Given the following dataset: <1,1,0>, <1,- 1,0>, <3,- 1,0>, <- 1,- 1,1> construct a neural network with a step activation function able to separate the classes. The step function is defined as:

[10 marks] [total: 20 marks]

Question 3

a.   Considering the following classification methods: Neural Networks, Support Vector Machines, Decision Trees; describe one advantage of each method with respect to the other ones. [3 marks]

b.  A reinforcement learning agent can execute two actions A and B. It explores with a policy that chooses A with probability 0.5, and B with probability 0.5. The optimal    policy is to always choose A. Would the value function of the agent converge to the same values if it learned with Sarsa or Q-learning? Justify your answer. [2 marks]

c.   A robot can execute the following actions: goto(R), which moves it to room R; pickup, which picks up a cup of coffee; putdown, which puts down the cup. The action pickup can only be executed if the cup is in the same room as the robot, while the action putdown can only be executed if the robot is holding the cup. The robot can perceive in which room it is, where objects are in the room, and whether or not there is a person in the room. It receives a reward of +1000 for putting down a cup of coffee in  room A when there is a person in it, and a reward of 0 for every other action. Consider the representation for the state space in which each state is a vector of the following values: . For instance,  a possible state is . Is this representation Markovian? Why? If you think it is Markovian, propose a Non-Markovian alternative, while if you think it is not Markovian, then propose a Markovian one. [3 marks]

d.   Compute the state-action value functions obtained by Sarsa and Q-learning for the  MDP in the following figure, under an ε-greedy policy with ε = 0.2. The edges of the graph are actions, labelled with their name, probability, and immediate reward when non-zero. The nodes are states, labelled with their name. For this MDP, γ=0.5.

[12 marks] [total: 20 marks]