K means
K-means is iteration algorithm, where in each iteration we specify a position of centroid, we will label all objects in our features domain, and compute a new center of mass for our centroids. This will then continue to next iteration. We stop the iteration process at the moment, where our change in position for our centroids is small, and therefore we converged our centroids position to solution.
As long as this algorithm is depended on initial conditions, its adviced to set a constant starting positions, and to figure our which centroid is labeled by a certain object.
For this exercise we use a data.mat dataset, which is dataset of given value pairs (x,y) of points in space. These values are measured values of electric flux and magnetic flux that corresponds to measurement of 80 units of PCBs. Our customer informed us, that there are 4 different types of PCBs in the set. Find a boundries (golden standards) to classify these PCBs in later evaluation process.
Algorithm:
- Initialize random 4 positions
- For each given pair of points, compute the Euclidean distance to the golden standard (all 4 of them) and pick whichever is the nearest as the “parrent”.
- For each etalon compute the new center of the mass, based on nearest points(same parents). To find general equations head here : http://education.pkodytek.com/image-processing/09_object_labeling/
- Repeat until values converge
Since points converge really fast, there is just
one visible jump after first and second location
Data | Value |
---|---|
Source | https://cs.wikipedia.org/wiki/K-means |
Code | 11_K_means_code.zip |
Solution code | 11_K_means_solution_code.zip |