# ML Algos from Scratch – KMeans Clustering # Introduction to K-means Clustering

K-means clustering is an unsupervised machine learning algorithm. An unsupervised learning algorithm is applied on unlabeled data(data without defined categories or groups). The purpose of this algorithm is to find clusters(groups) in the data with the number of clusters being equal to K.

The K- means clustering works by randomnly initialisinsg ‘k’ cluster centers from all the data points.
If we have m training examples in our data set,  (x(1), x(2),……….., x(m)), then randomly initialise k cluster centroids choosing any k points from m points.

Cluster centers:- (μ(1), μ(2),……….., μ(K))

Then, the algorithm iterates between these two steps –

### 1. Cluster Assignment Step

It finds the distance of each point from all of these K clusters and assigns to each point a number which corresponds to the cluster which has the minumum distance from that point

for i = 1 to m
{
assign c(i) as index(1 to K) of cluster center closest to x(i)
}

### 2. Update Cluster Centers Step

In this step,the cluster centers of each cluster are updated by taking the mean of all the points assigned to that cluster.

for k = 1 to K
{
assign μ(k) as the mean of all data points assigned to μ(k)
}

### Choosing K

To find the number of clusters in the data, the user needs to run the K-means clustering algorithm for a range of K values and compare the results. In general, there is no method for determining exact value of K, but an accurate estimate may be obtained by using Elbow method.

In elbow method, values of K are tried starting from 1 and the cost function which is the sum of distances between points and their cluster centers is plotted. As the value of K increases this cost will always decrease which means that when K is very high this cost will be minimum but choosing the maximum K defeats the purpose of clustering. Instead K is choosen at the point where this cost decreases sharply also known as the ‘elbow point’.

Another way of choosing K is according to your purpose or what you expect your data to show you. For example, in a problem where we have a data set of different T-shirts we can choose K=3 for small, medium and large sized T-shirts.

That is all about K-means clustering. See how you can use this K-Means clustering for image compression in this article.

Any doubts and suggestions are welcome in the comments. Happy learning.

### Saksham Malhotra

After learning python 2 years ago and dabbling in web development, I encountered data science and felt 'Yes, this is what I want to do.' I strongly believe that the world can be changed through the power of data. Among my other interests, I love reading fiction and owning more books than I can possibly read at once.

#### Latest posts by Saksham Malhotra (see all) ### Saksham Malhotra

After learning python 2 years ago and dabbling in web development, I encountered data science and felt 'Yes, this is what I want to do.' I strongly believe that the world can be changed through the power of data. Among my other interests, I love reading fiction and owning more books than I can possibly read at once. Linkedin - www.linkedin.com/in/saksham-malhotra-9bb69513b

### One Comment

1. skptricks

Great work… Thanks for this post.