This is part three of the Scikit-learn series, which is as follows:

- Part 1 – Introduction
- Part 2 – Supervised learning in Scikit-learn
- Part 3 – Unsupervised Learning in Scikit-learn (this article)

### A quick recap :

So, Unsupervised learning is a type of machine learning algorithm whose goal is to discover groups of similar examples within the datasets consisting of input data without labeled responses/target values.

## What Scikit-Learn has in its unsupervised package?

As we have already seen what scikit-learn offers us in terms of unsupervised learning let us again see which varieties of algorithms are available with us to use :

1.Gaussian mixture models

2.Manifold learning (An approach to non-linear dimensionality reduction)

3.Clustering

4.Principal component analysis (PCA)

**We are discussing only those algorithms which involves code and need implementation and remaining only needs mathematical explanation.**

```
from sklearn import mixture # importing statement
clf = mixture.GaussianMixture(n_components=2, covariance_type='full') # you can choose components to be used on your own
clf.fit() # fit the model on required training data
```

### Clustering

Though there are many clustering algorithms that we can choose from, we will discuss the most used algorithm and that is **k-means clustering**.

You can read more about K-Means Clustering here.

Also, check out how it can be used to compress images here.

The main idea is to define k centroids, one for each cluster. These centroids should be placed very carefully because of different location it causes a different result. So, the better choice is to place them as much as possible far away from each other. The next step is to take each point belonging to a given data set and associate it to the nearest centroid. When no point is pending, the first step is completed. At this point, we need to re-calculate k new centroids and so on which will finally lead to the final clusters.

```
from sklearn.cluster import KMeans # import statement
import numpy as np # importing numpy for arrays
X = np.array([[1, 2], [1, 4], [1, 0], #training data
[4, 2], [4, 4], [4, 0]])
kmeans = KMeans(n_clusters=2, random_state=0).fit(X) # only 2 clusters are used
kmeans.labels_ #Labels of each point
```

```
kmeans.predict([[1, 1], [4, 0]])
```

```
kmeans.cluster_centers_ # centres of clusters are given
```

```
import numpy as np
from sklearn.decomposition import PCA
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
pca = PCA(n_components=2) # only 1 parameter used for basic understanding
pca.fit(X)
```

```
print(pca.explained_variance_ratio_)# Percentage of variance explained by each of the selected components
```

```
print(pca.singular_values_) #The singular values corresponding to each of the selected components.
```

## Take it all with you guys !

I hope you all had a great time reading all of this scikit-learn series. Obviously, this series will not make you a machine learning god but the practice can! Also, this series will definitely make you keep your first foot on your path.

Those who didn’t know anything about scikit-learn can now at least write some code on their own.

See you soon with something more interesting till then practice what you learned.

Happy to help you all!

### Deepanshu Gaur

Loves to learn new technologies and this attitude keeps me going.

Fast learner.

Strong foundation in data structures & algorithms.

https://www.linkedin.com/in/deepanshu-g-37a42899

#### Latest posts by Deepanshu Gaur (see all)

- Scikit Learn – Part 3 – Unsupervised Learning - April 8, 2018
- Scikit Learn – Part 2 – Supervised Learning - March 21, 2018
- Scikit Learn – Part 1 – Introduction - March 14, 2018

wonderful article

keep posting more

Thanks for the feedback.

Will keep posting more!