Supervised Learning In Scikit-Learn
This is part two of the Scikit-learn series, which is as follows:
- Part 1 – Introduction
- Part 2 – Supervised learning in Scikit-learn (this article)
- Part 3 – Unsupervised Learning in Scikit-learn
Recap Of Supervised Learning
So you are already familiar with Supervised learning but those who are not let’s take a quick recap.
Q. What is supervised learning?
In machine learning, it is a type of system in which both input and desired output data are provided. Input and output data are labeled for classification to provide a learning basis for future data prediction.
from sklearn.linear_model import LinearRegression #import statement clf=LinearRegression() #we created a classifier from an object named LinearRegression. clf.fit ([[0, 0], [1, 1], [2, 2]], [0, 1, 2]) #fitting a classifier on a data
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
clf.coef_ # calculated the slope
array([ 0.5, 0.5])
As you can see there is just a small code that can get you started with this amazing algorithm. Isn’t it amazing? You can even try prediction on the testing set by using ‘.pred’ function.
For more in-depth understanding of this linear model consider trying yourself by taking an example.
An easy example can be found here :
from sklearn import svm X = [[0, 0], [1, 1]] # dataset y = [0, 1] clf = svm.SVC() # classifier is created clf.fit(X, y) # fitting classifier on dataset
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf', max_iter=-1, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False)
The parameters you see in the brackets can be changed according to the dataset you have been given.
Once you are comfortable writing the above-mentioned code try yourself by tweaking the parameters.
clf.predict([[1., 0.]]) # predicting values
from sklearn.linear_model import SGDClassifier X = [[0., 0.], [1., 1.]] y = [0, 1] clf = SGDClassifier(loss="hinge", penalty="l2") #hyperparameters clf.fit(X, y)
SGDClassifier(alpha=0.0001, average=False, class_weight=None, epsilon=0.1, eta0=0.0, fit_intercept=True, l1_ratio=0.15, learning_rate='optimal', loss='hinge', max_iter=5, n_iter=None, n_jobs=1, penalty='l2', power_t=0.5, random_state=None, shuffle=True, tol=None, verbose=0, warm_start=False)
Naive Bayes In Sklearn
Naive Bayes classifier calculates the probabilities for every factor. Then it selects the outcome with the highest probability.
This classifier assumes the features are independent. Thus the word ‘naive’ is used.
It is one of the most common algorithms in machine learning.
from sklearn import datasets iris = datasets.load_iris() # loading the dataset from sklearn.naive_bayes import GaussianNB gnb = GaussianNB() y_pred = gnb.fit(iris.data, iris.target).predict(iris.data) #fitting and predicting on same line
Decision Tree Regression In Sklearn
Decision Trees is another type of supervised machine learning algorithm where the data is continuously split according to a certain parameter.
More the data more is the accuracy of the model.
Decision trees is one of the most used algorithms out of all supervised learning algorithms and finds huge applications in the industry.
from sklearn import tree X = [[0, 0], [1, 1]] Y = [0, 1] clf = tree.DecisionTreeClassifier() clf = clf.fit(X, Y) clf.predict([[2., 3.]])
Ensemble Methods In SkLearn
It contains bagging methods and random forests.
Another powerful machine learning algorithm that produces great results even without hyper-parameter tuning.
It is also one of the most used algorithms, because of its simplicity and the fact that it can be used for both classification and regression tasks.
from sklearn.ensemble import RandomForestClassifier X = [[0, 0], [1, 1]] Y = [0, 1] clf = RandomForestClassifier(n_estimators=10) clf = clf.fit(X, Y)
What have we Learnt?
By now we have learned how to implement each supervised algorithm using scikit-learn.
Stil there are many features that each algorithm has in scikit-learn which can be mastered only by practicing.
So stop wasting your time and head straight onto the official documentation of scikit-learn for supervised algorithms and make sure you understand each algorithm mathematically as well as by practicing on different datasets.
Note: Next part of this series is on unsupervised learning, so make sure you don’t miss that: Part 3 – Unsupervised Learning in Scikit-learn
Loves to learn new technologies and this attitude keeps me going.
Strong foundation in data structures & algorithms.