Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

CMDA 3634 Spring 2022 HW 15

In this assignment you will begin working on a fuzzy k-means algorithm. You will write code to measure distances between points and calculate weighted means.

You don’t need to do anything to support parallelism for this assignment (no OpenMP, no MPI).

 

1    k-means and fuzzy Lloyd’s algorithm

 

If you’d like an alternative introduction to fuzzy k-means, there is one here:

 

https://en.wikipedia.org/wiki/Fuzzy_clustering#Fuzzy_C-means_clustering

 

Euclidean distance between two vectors v0  = (x0 ,y0 ) and v0  = (x1 ,y1 ) is defined by

v0 v1 ∥ =    (x0 x1 )2 + (y0 y1 )2 .                                              (1)

Given cluster centers, c1 , . . . , ck , and a real fuzziness parameter m > 1, the  “weight” function (or dissimilarity) between a vector vi  and a cluster center cj  is

 


wi,j  =  


    1/(m 1) ! 1 .


 

(2)


Given data points v0 , . . . , vn − 1 , the weighted mean of the data with respect to cluster center cj  is

j  =  .                                                               (3)

The superscripts m  in (3) are exponents: each wi,j  is to be raised to the mth power. To keep things simple, assume m = 2 until further notice.

Fuzzy Lloyd’s algorithm repeatedly updates the cluster centers with the weighted mean of the data with respect to the current cluster center. In other words, it repeatedly replaces cj  with j .


 

1.1    Example

 

 


Here is some example data

v0  =  0(0)  ,    v1  =  1(0)  ,    v2  =  015  .

Here are example cluster centers

c0  =   1  ,    c1  =   1(2)  .

The distances between c0  and the vectors are

v0 c0 ∥ = 1,    v1 c0 ∥ = 2,    v2 c0 ∥ = 17

The distances between c1  and the vectors are

v0 c1 ∥ = 5,    v1 c1 ∥ = 2,    v2 c1 ∥ = 5

 

w0 ,0  =     + ! 1  =  ,

w0 ,1  =     + ! 1  =  ,

w1 ,0  =     + ! 1  =  ,

w1 ,1  =     + ! 1  =  ,

w2 ,0  =     + ! 1  =  , w2 ,1  =     + ! 1  =  .

The weighted mean of the data with respect to c0  is

 

0  =                    2 +   2 +   2                           

The weighted mean of the data with respect to c1  is

   2     0(0)  +   2     1(0)  +   2   015  

 

If this example were given to your fuzzykmeans function, it should overwrite new cluster center 0  and overwrite centers[1] with 1 .


 

 .

 

 

 

 .

centers[0] with the


 

2    Implementation

 

2.1    Vector arrays

 

For this assignment, you’ll be representing collections of vectors in R2 as double arrays, double*. Since each vector has two dimensions (x and y), the arrays will be twice as long as the number of vectors being represented. The array entries of the ith vector will be stored in the entries 2i and (2i+1) of the array. This is sometimes called a “flattened” array. For example, the three data vectors in Section 1.1 would be represented in your program by an array with six entries,

{ 0, 0 , 0, 1 , 1, 0.5},

z}    z}   z }

v0           v1             v2

and the two cluster centers in the example would be represented in your program by an array with four entries,

{−1, 0, 2, 1 }.

  z }         z}

c0             c1

 

2.2    The files

Instructors will push one file to your remote class Git repo:

 

A header file with declarations of the functions listed above

  This file is fuzzylloyd/fuzzykmeans .h

 You don’t have to edit this file, but if you write any extra supporting functions in fuzzylloyd/fuzzykmeans .c, they should be declared in this header file. You may want to add a vec constructor, for

example.

 

You only have to write one new file:

 

• A C file where you implement the functions that are in declared in fuzzylloydfuzzykmeans .h.

  Name this file fuzzylloyd/fuzzykmeans .c

 You used a header file in HW 6, so you may want to look at the HW 6 project for a refresher on using a header file.

 

2.3    The functions

In your C file, you need to implement

 

• Euclidean distance squared (1)

  double  dist2(double*  v0,  double*  v1)


 

 v0 and v1 are pointers to arrays of length 2, each representing 2-dimensional vectors

 v0 and v1 may be pointers into the middle of an array containing multiple vectors, but you can think of each of them as an array of length 2 representing a single vector each: v0[0] is the x coordinate of the first vector and so on.

 You never actually need distance itself, you only need distance squared—which is easier to calculate anyway.

• The weight function (2) calculating wi,j

  double  weight(double*  vi,  double*  centers,  int  j,  int  k)

 vi is a pointer to a single data vector representing the ith data vector vi

  centers is a pointer to all of the cluster center vectors. Notice that you will need the distance from vi  to all of the cluster centers in order to calculate the weight wi,j , even though wi,j  is the weight associated with just one cluster center.

  j specifies the cluster in which the weight of vi  is being calculated

 k is the number of clusters (so the array centers is has 2k entries)

• A fuzzy Lloyd update using (3) to update cj  ← j  for each j .

 void  fuzzykmeans(double*  data,  int  n,  double*  centers,  int  k)

  data is a pointer to all of the data vectors

 n is the number of data points (so the array data is has 2n entries)

  centers is a pointer to all of the cluster center vectors

 k is the number of clusters (so the array centers is has 2k entries)

 

3    Testing and expectations

 

You may compose and run any tests that you want, but if you write a main function, it must not be in fuzzylloyd/fuzzykmeans .c. If you put a main in fuzzymeans .c, it will interfere with my own main function when I compile your code into a library file for my tests.

 

• Your code should compile without producing warnings or errors from the compiler.

• Each of your functions should have the correct behavior.

  Most of your functions need to return the correct value, but instead of returning a value, fuzzykmeans should use pointers to overwrite the array centers.

• Your functions should not produce memory errors or leaks.

 You must not read any uninitialized memory or write to any invalid or out-of-bounds addresses.

 If your functions allocate heap memory, then they should free that memory.


 

4    Submission

 

Push your source code fuzzylloyd/fuzzykmeans .c to your remote class Git repository. If you changed fuzzylloyd/fuzzykmeans .h, then push that too.

You are not required to submit any main function files, makefiles, Slurm submission files, or output files. But for your own benefit, you should write a main function to test your code and a makefile to simplify compilation.