CMDA 3634 Spring 2022 HW 15
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
CMDA 3634 Spring 2022 HW 15
In this assignment you will begin working on a fuzzy k-means algorithm. You will write code to measure distances between points and calculate weighted means.
You don’t need to do anything to support parallelism for this assignment (no OpenMP, no MPI).
1 k-means and fuzzy Lloyd’s algorithm
If you’d like an alternative introduction to fuzzy k-means, there is one here:
https://en.wikipedia.org/wiki/Fuzzy_clustering#Fuzzy_C-means_clustering
Euclidean distance between two vectors v0 = (x0 ,y0 ) and v0 = (x1 ,y1 ) is defined by
∥v0 − v1 ∥ = (x0 − x1 )2 + (y0 − y1 )2 . (1)
Given cluster centers, c1 , . . . , ck , and a real fuzziness parameter m > 1, the “weight” function (or dissimilarity) between a vector vi and a cluster center cj is
Given data points v0 , . . . , vn − 1 , the weighted mean of the data with respect to cluster center cj is
j = . (3)
The superscripts m in (3) are exponents: each wi,j is to be raised to the mth power. To keep things simple, assume m = 2 until further notice.
Fuzzy Lloyd’s algorithm repeatedly updates the cluster centers with the weighted mean of the data with respect to the current cluster center. In other words, it repeatedly replaces cj with j .
Here is some example data
v0 = 0(0) , v1 = 1(0) , v2 = 015 .
Here are example cluster centers
c0 = 1 , c1 = 1(2) .
The distances between c0 and the vectors are
∥v0 − c0 ∥ = 1, ∥v1 − c0 ∥ = √2, ∥v2 − c0 ∥ = √ 17
The distances between c1 and the vectors are
∥v0 − c1 ∥ = √5, ∥v1 − c1 ∥ = 2, ∥v2 − c1 ∥ = √5
w0 ,0 = + ! − 1 = ,
w0 ,1 = + ! − 1 = ,
w1 ,0 = + ! − 1 = ,
w1 ,1 = + ! − 1 = ,
w2 ,0 = + ! − 1 = , w2 ,1 = + ! − 1 = .
The weighted mean of the data with respect to c0 is
0 = 2 + 2 + 2 ≈
The weighted mean of the data with respect to c1 is
2 0(0) + 2 1(0) + 2 015
If this example were given to your fuzzykmeans function, it should overwrite new cluster center 0 and overwrite centers[1] with 1 .
.
.
centers[0] with the
2 Implementation
2.1 Vector arrays
For this assignment, you’ll be representing collections of vectors in R2 as double arrays, double*. Since each vector has two dimensions (x and y), the arrays will be twice as long as the number of vectors being represented. The array entries of the ith vector will be stored in the entries 2i and (2i+1) of the array. This is sometimes called a “flattened” array. For example, the three data vectors in Section 1.1 would be represented in your program by an array with six entries,
{ 0, 0 , 0, 1 , 1, 0.5},
z} z} z }
v0 v1 v2
and the two cluster centers in the example would be represented in your program by an array with four entries,
{−1, 0, 2, 1 }.
z } z}
c0 c1
2.2 The files
Instructors will push one file to your remote class Git repo:
• A header file with declarations of the functions listed above
– This file is fuzzylloyd/fuzzykmeans .h
– You don’t have to edit this file, but if you write any extra supporting functions in fuzzylloyd/fuzzykmeans .c, they should be declared in this header file. You may want to add a vec constructor, for
example.
You only have to write one new file:
• A C file where you implement the functions that are in declared in fuzzylloydfuzzykmeans .h.
– Name this file fuzzylloyd/fuzzykmeans .c
– You used a header file in HW 6, so you may want to look at the HW 6 project for a refresher on using a header file.
2.3 The functions
In your C file, you need to implement
• Euclidean distance squared (1)
– double dist2(double* v0, double* v1)
– v0 and v1 are pointers to arrays of length 2, each representing 2-dimensional vectors
– v0 and v1 may be pointers into the middle of an array containing multiple vectors, but you can think of each of them as an array of length 2 representing a single vector each: v0[0] is the x coordinate of the first vector and so on.
– You never actually need distance itself, you only need distance squared—which is easier to calculate anyway.
• The weight function (2) calculating wi,j
– double weight(double* vi, double* centers, int j, int k)
– vi is a pointer to a single data vector representing the ith data vector vi
– centers is a pointer to all of the cluster center vectors. Notice that you will need the distance from vi to all of the cluster centers in order to calculate the weight wi,j , even though wi,j is the weight associated with just one cluster center.
– j specifies the cluster in which the weight of vi is being calculated
– k is the number of clusters (so the array centers is has 2k entries)
• A fuzzy Lloyd update using (3) to update cj ← j for each j .
– void fuzzykmeans(double* data, int n, double* centers, int k)
– data is a pointer to all of the data vectors
– n is the number of data points (so the array data is has 2n entries)
– centers is a pointer to all of the cluster center vectors
– k is the number of clusters (so the array centers is has 2k entries)
3 Testing and expectations
You may compose and run any tests that you want, but if you write a main function, it must not be in fuzzylloyd/fuzzykmeans .c. If you put a main in fuzzymeans .c, it will interfere with my own main function when I compile your code into a library file for my tests.
• Your code should compile without producing warnings or errors from the compiler.
• Each of your functions should have the correct behavior.
– Most of your functions need to return the correct value, but instead of returning a value, fuzzykmeans should use pointers to overwrite the array centers.
• Your functions should not produce memory errors or leaks.
– You must not read any uninitialized memory or write to any invalid or out-of-bounds addresses.
– If your functions allocate heap memory, then they should free that memory.
4 Submission
Push your source code fuzzylloyd/fuzzykmeans .c to your remote class Git repository. If you changed fuzzylloyd/fuzzykmeans .h, then push that too.
You are not required to submit any main function files, makefiles, Slurm submission files, or output files. But for your own benefit, you should write a main function to test your code and a makefile to simplify compilation.
2022-04-13