2

Scikit Learn – Part 3 – Unsupervised Learning

Share this article!

Supervised Learning In Scikit-Learn

WELCOME BACK AGAIN FOLKS!

Let’s dive into another form of machine learning and i.e Unsupervised Learning.

This is part three of the Scikit-learn series, which is as follows:

A quick recap :

So, Unsupervised learning is a type of machine learning algorithm whose goal is to discover groups of similar examples within the datasets consisting of input data without labeled responses/target values.

What Scikit-Learn has in its unsupervised package?

As we have already seen what scikit-learn offers us in terms of unsupervised learning let us again see which varieties of algorithms are available with us to use :

1.Gaussian mixture models
2.Manifold learning (An approach to non-linear dimensionality reduction)
3.Clustering
4.Principal component analysis (PCA)

We are discussing only those algorithms which involves code and need implementation and remaining only needs mathematical explanation.

Straight into codes!

Gaussian mixture models

These are a type of probabilistic model for representing normally distributed sub-data within a data.
It learns from sub-data automatically.

In [5]:
from sklearn import mixture # importing statement
clf = mixture.GaussianMixture(n_components=2, covariance_type='full') # you can choose components to be used on your own
clf.fit() # fit the model on required training data 

Clustering

Though there are many clustering algorithms that we can choose from, we will discuss the most used algorithm and that is k-means clustering.

You can read more about K-Means Clustering here.

Also, check out how it can be used to compress images here.

The main idea is to define k centroids, one for each cluster. These centroids should be placed very carefully because of different location it causes a different result. So, the better choice is to place them as much as possible far away from each other. The next step is to take each point belonging to a given data set and associate it to the nearest centroid. When no point is pending, the first step is completed. At this point, we need to re-calculate k new centroids and so on which will finally lead to the final clusters.

In [18]:
from sklearn.cluster import KMeans   # import statement
import numpy as np    # importing numpy for arrays

X = np.array([[1, 2], [1, 4], [1, 0],   #training data
               [4, 2], [4, 4], [4, 0]])
kmeans = KMeans(n_clusters=2, random_state=0).fit(X)  # only 2 clusters are used
kmeans.labels_  #Labels of each point
Out[18]:
array([0, 0, 0, 1, 1, 1])
In [19]:
kmeans.predict([[1, 1], [4, 0]])
Out[19]:
array([0, 1])
In [20]:
kmeans.cluster_centers_  # centres of clusters are given
Out[20]:
array([[ 1.,  2.],
       [ 4.,  2.]])

Principal Component Analysis (PCA)

It is a technique that is widely used for dimensionality reduction, feature extraction, data visualization etc.

Note: This algorithm needs relatively more understanding so it is advised to read about this algorithm thoroughly before implementing it.

In [21]:
import numpy as np
from sklearn.decomposition import PCA
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
pca = PCA(n_components=2) # only 1 parameter used for basic understanding
pca.fit(X)
Out[21]:
PCA(copy=True, iterated_power='auto', n_components=2, random_state=None,
  svd_solver='auto', tol=0.0, whiten=False)
In [22]:
print(pca.explained_variance_ratio_)# Percentage of variance explained by each of the selected components
[ 0.99244289  0.00755711]
In [23]:
print(pca.singular_values_) #The singular values corresponding to each of the selected components.
[ 6.30061232  0.54980396]

Take it all with you guys !

I hope you all had a great time reading all of this scikit-learn series. Obviously, this series will not make you a machine learning god but the practice can! Also, this series will definitely make you keep your first foot on your path.
Those who didn’t know anything about scikit-learn can now at least write some code on their own.
See you soon with something more interesting till then practice what you learned.
Happy to help you all!

Share this article!

Deepanshu Gaur

Deepanshu Gaur

A technology lover and computer hardware enthusiast. If Gaming is my love then Machine learning is my passion.
Loves to learn new technologies and this attitude keeps me going.
Fast learner.
Strong foundation in data structures & algorithms.

https://www.linkedin.com/in/deepanshu-g-37a42899
Deepanshu Gaur

Deepanshu Gaur

A technology lover and computer hardware enthusiast. If Gaming is my love then Machine learning is my passion. Loves to learn new technologies and this attitude keeps me going. Fast learner. Strong foundation in data structures & algorithms. https://www.linkedin.com/in/deepanshu-g-37a42899

2 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *