Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

COMP3003 MACHINE LEARNING

LAB PRACTICAL P4: KMEANS CLUSTERING

1. Generate dataset

· Generate a 2D uncorrelated dataset with slightly overlapping clusters of 1000 points per cluster (2000 points for dataset in total).

· For the first 1000 points use a mean of [-4 -1] & standard deviation of 0.75 in each dimension with a zero covariance.

· For the remaining 1000 points use a mean of [3 4] & standard deviation of 2.0 in each dimension with a zero covariance.

· Plot all the points in the datasets in 2 dimensions using distinct colours for the different sets of points.

· The results should look something like this:

2. Concatenate the training datasets and plot

· Concatenate the two datasets into a single dataset.

· HINT: In Matlab you can write

TrainData = [dataNormal1 dataNormal2];

· Plot the results. Note all information as to the distribution identity is now no longer directly available so we can only show the points in the same colour.

3. Implement KMeans from first principles

· Implement a KMeans algorithm in Matlab yourself from first principles.

· PLEASE DO NOT use a Matlab kmeans function to do this!

· Training the KMeans algorithm uses iterations over the dataset.

· Initially you can just run it a fixed number of times to get it working quickly.

· Think of better ways to get the iterative KMeans training algorithm to terminate.

· Try calculating the overall distances of all the data points from the clusters on each iteration and plot them out (similar as below).

 

· Think about how reassignment of data points to cluster centre changes as the algorithm iterates to find cluster centres. What does this tell you?

4. Run your KMeans function on the data and plot results

· Run your algorithm and label the training data.

· How many clusters should you be looking for?

· Plot the results - they should look something like that shown below (in this example, the number of MaxIterations was set to 10. The position of the final centroids was also marked out.):

 

· Interpret the results. Comment on the form of the decision boundary.