Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit


COM 2004

DEPARTMENT OF COMPUTER SCIENCE

DATA DRIVEN COMPUTING

AUTUMN SEMESTER 2014–2015



1.  a)     A Gaussian distribution whose mean is 100 and whose standard deviation is 10 is used to generate N random samples. The samples are then used to estimate the mean, µ , and variance, S, of the Gaussian using the formulae below.


1 N

µ = N E xi


1 N

S = N E (xi - µ)2

(i)    Use a series of sketches to show how the distribution of the mean estimate     changes as the value of N increases.                                                                 [20%]

(ii)    Use a sketch to show how the distribution of the variance estimate changes as     the value of N increases.                                                                                     [20%]

b)     Alice has two routes for travelling to work.  Travel times on these routes are normally distributed.

-      Route A takes on average 60 minutes with a standard deviation of 20 minutes.

-      Route B takes on average 70 minutes with a standard deviation of 10 minutes. Work starts at precisely 9:00 and Alice does not want to be late.

(i)   Which route should Alice take if leaving home at 8:00? Justify your answer.   [10%] (ii)   Which route should Alice take if leaving home at 7:30? Justify your answer.   [10%]

(iii)   At what departure time would both routes be equally likely to get Alice to work by     9:00?                                                                                                                    [10%]

c)     A process is generating independent uniformly distributed random numbers, x

x ~ U(ab)

You observe the sequence of samples: 5.1, 10.2, 8.4 and 3.1. Using maximum-likelihood parameter estimation what would be the best estimates for the parameters, a and b. Ex- plain your reasoning.

[30%]

2.  a)      Consider two classes, o1 and o2 each containing two points as follows,

o1     =   ,(-2 ’1)T ’(- 1 ’1)T

o2     =   ,(0’0)T ’(1’0)T

Consider a linear classifier with the discriminant function w1x1 + w2x2 + w0 = 0.            Using initial weights of w1 = 1, w2 = 1 and w0 = -2 and a learning rate of 0.5, run the perceptron learning algorithm by hand until all the points are correctly classified (show

all your workings).                                                                                                           [50%]

b)      Plot the points given in 2 (a) in feature space and plot the decision boundaries given by     the initial and final weights.                                                                                             [10%]

c)      Consider two classes, o1 and o2 each containing two points as follows,

o1     =   ,(1’0)T ’(1’2)T

o2     =   ,(0’1)T ’(2’1)T

When using this data the perceptron learning algorithm will never terminate regardless     of what values are chosen for the starting weights. Explain why.                                  [10%]

d)      Draw a two layer perceptron that can correctly classify the points in 2 (c).                   [30%]

3.  a)      Use pseudocode to show how samples are classified using a K-nearest neighbour clas-     sifier.                                                                                                                                [30%]

b)      Consider a K-nearest neighbour that uses a Euclidean distance measure, K = 1, and the following samples as training data,

o1     =   ,(0’0)T ’(2’0)T

o2     =   ,(-2 ’2)T ’(-2 ’-2)T ’(0’2)T ’(0’-2)T

A point is selected uniformly at random from the region defined by -2 < x1  < 2 ’-2 < x2 < 2. What is the probability that the point is classified as belonging to class o1 ? [Hint: start by sketching the decision boundary.]

[40%]

c)      Consider the distance measure

d(x1 ’x2) = 1 - cos(9)

where 9 is the angle between vectors x1 and x2 .

Two separate 1-nearest neighbour classifiers are constructed, one using the Euclidean distance and the other using the distance measure defined above. The following labelled samples are provided as training data,

o1     =   ,(1’0)T ’(0’1)T

o2     =   ,(1’1)T

Sketch the region of feature space in which the two classifiers would be in agreement.

[30%]

4.  a)     Give an account of the different ways in which distances between clusters can be de-     fined.                                                                                                                                [30%]

b)      Run the agglomerative clustering algorithm on the following data and write down the cluster membership after each iteration.

(1’1)’(2’1)’(2’4)’(4’4)’(5’0)’(5’5)

Your algorithm must use the minimum Euclidean point-to-point distance to measure the distance between clusters and you must iterate the algorithm until only one cluster re- mains.

[30%]

c)      Draw a dendrogram to illustrate the hierarchical clusterings generated in 4 (b).

[10%]

d)     The data in 4 (b) is to be clustered into two clusters using the k-means clustering algo-     rithm. By providing specific examples, show how the different initialisations of the cluster     centres can lead to different results.                                                                               [30%]