Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

COM 2004

DEPARTMENT OF COMPUTER SCIENCE

DATA DRIVEN COMPUTING

AUTUMN SEMESTER 2014–2015

Answer THREE questions.

All questions carry equal weight. Figures in square brackets indicate the percentage of avail- able marks allocated to each part of a question.

1.  a)     A Gaussian distribution whose mean is 100 and whose standard deviation is 10 is used to generate N random samples.  The samples are then used to estimate the mean, µ , and variance, S, of the Gaussian using the formulae below.

 µ =

 xi

S =  (xiµ)2

(i)    Use  a series of sketches to show  how the distribution of the  mean estimate changes as the value of N increases.                 [20%]

(ii)    Use a sketch to show how the distribution of the  variance estimate changes as the value of N increases.                        [20%]

b)      Alice has two routes for travelling to work.  Travel times on these routes are normally distributed.

-      Route A takes on average 60 minutes with a standard deviation of 20 minutes. -      Route B takes on average 70 minutes with a standard deviation of 10 minutes.

Work starts at precisely 9:00 and Alice does not want to be late.

(i)    Which route should Alice take if leaving home at 8:00? Justify your answer.    [10%]

(ii)    Which route should Alice take if leaving home at 7:30? Justify your answer.    [10%]

(iii)    At what departure time would both routes be equally likely to get Alice to work by 9:00?                        [10%]

c)     A process is generating independent uniformly distributed random numbers, x

x ~U (ab)

You observe the sequence of samples: 5.1, 10.2, 8.4 and 3.1. Using maximum-likelihood parameter estimation what would be the best estimates for the parameters, a and b. Ex- plain your reasoning. [30%]

2.  a)      Consider two classes, ω1  and ω2  each containing two points as follows,

ω 1     =   {(2, 1)T , (1, 1)T }

ω2     =   {(0, 0)T , (1, 0)T }

Consider a linear classiier with the discriminant function w1x1 + w2x2 + w0 = 0.

Using initial weights of w1  = 1, w2  = 1 and w0  = −2 and a learning rate of 0.5, run the

perceptron learning algorithm by hand until all the points are correctly classiied (show all your workings).                                  [50%]

b)      Plot the points given in 2 (a) in feature space and plot the decision boundaries given by the initial and inal weights.                             [10%]

c)      Consider two classes, ω1  and ω2  each containing two points as follows,

ω 1     =   {(1, 0)T , (1, 2)T }

ω2     =   {(0, 1)T , (2, 1)T }

When using this data the perceptron learning algorithm will never terminate regardless of what values are chosen for the starting weights. Explain why.           [10%]

d)      Draw a two layer perceptron that can correctly classify the points in 2 (c).                    [30%]

3.  a)     Use pseudocode to show how samples are classiied using a K-nearest neighbour clas-siier.                             [30%]

b)     Consider a K-nearest neighbour that uses a Euclidean distance measure, K = 1, and the following samples as training data,

ω 1     =   {(0, 0)T , (2, 0)T }

ω2     =   {( 2, 2)T , ( 2, 2)T , (0, 2)T , (0, 2)T }

A point is selected uniformly at random from the region deined by —2 ≤ x1  ≤ 2, —2 ≤ x2 ≤ 2. What is the probability that the point is classiied as belonging to class ω1 ? [Hint: start by sketching the decision boundary.] [40%]

c)      Consider the distance measure

d(x1 ,x2) = 1 −cos(θ)

where θ is the angle between vectors x1 and x2 .

Two separate 1-nearest neighbour classiiers are constructed, one using the Euclidean distance and the other using the distance measuredeined above. The following labelled samples are provided as training data,

ω 1     =   {(1, 0)T , (0, 1)T }

ω2     =   {(1, 1)T }

Sketch the region of feature space in which the two classiiers would be in agreement. [30%]

4.  a)     Give an account of the different ways in which distances between clusters can be de- ined.                                                                  [30%]

b)      Run the agglomerative clustering algorithm on the following data and write down the cluster membership after each iteration.

(1, 1), (2, 1), (2, 4), (4, 4), (5, 0), (5, 5)

Your algorithm must use the minimum Euclidean point-to-point distance to measure the distance between clusters and you must iterate the algorithm until only one cluster re- mains. [30%]

c)      Draw a dendrogram to illustrate the hierarchical clusterings generated in 4 (b). [10%]

d)     The data in 4 (b) is to be clustered into two clusters using the k-means clustering algo-

rithm. By providing speciic examples, show how the different initialisations of the cluster centres can lead to different results.                       [30%]