0

Introductory Guide to NumPy

Share this article!

NumPy

NumPy is a Linear Algebra Library for Python. We can use it to create vectors and arrays or matrices of numbers and perform mathematical operations on them. It is one of the most important libraries in Data Science as it is relied upon by almost all of the libraries in the Python Data Science stack as one of their main building blocks.

You are encouraged to scroll through the official documentation of NumPy for clearer understanding.

This tutorial is intended for those readers who have basic understanding of the Python syntax.

Installing NumPy

You can install numpy simply using pip:

pip install numpy

Using NumPy

Once you’ve installed NumPy you can import it as a library:

In [1]:
import numpy as np

Arrays

In NumPy, Arrays can be one-dimensional, called vectors; or two-dimensional, called matrices. However, even a matrix can consist of just one row or column.

Creating Arrays

We can create NumPy Arrays using the built-in methods, as well as convert Python lists into NumPy arrays.

Using Built-in Methods

There are lots of built-in methods to generate NumPy Arrays:

arange

Returns evenly spaced integers within the given interval.

In [2]:
np.arange(0,10)
Out[2]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

You can also mention the step-size as the third parameter.

In [3]:
np.arange(0,11,2)
Out[3]:
array([ 0,  2,  4,  6,  8, 10])

zeros and ones

Generates arrays of zeros or ones of the given shape.

In [4]:
np.zeros(5) #an integer as parameter indicates a vector of given length
Out[4]:
array([ 0.,  0.,  0.,  0.,  0.])
In [5]:
np.zeros((3,5)) #a tuple of two integers as parameter indicates a matrix with (number of rows, number of columns)
Out[5]:
array([[ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.]])

Similarly, for the function ‘ones’:

In [6]:
np.ones(5)
Out[6]:
array([ 1.,  1.,  1.,  1.,  1.])
In [7]:
np.ones((5,5))
Out[7]:
array([[ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.]])

linspace

Returns evenly spaced real numbers over the given interval.

In [8]:
np.linspace(0,10,3) #The parameters are start point, end point, and the number of elements to be returned (50 by default) 
Out[8]:
array([  0.,   5.,  10.])
In [9]:
np.linspace(0,10,50)
Out[9]:
array([  0.        ,   0.20408163,   0.40816327,   0.6122449 ,
         0.81632653,   1.02040816,   1.2244898 ,   1.42857143,
         1.63265306,   1.83673469,   2.04081633,   2.24489796,
         2.44897959,   2.65306122,   2.85714286,   3.06122449,
         3.26530612,   3.46938776,   3.67346939,   3.87755102,
         4.08163265,   4.28571429,   4.48979592,   4.69387755,
         4.89795918,   5.10204082,   5.30612245,   5.51020408,
         5.71428571,   5.91836735,   6.12244898,   6.32653061,
         6.53061224,   6.73469388,   6.93877551,   7.14285714,
         7.34693878,   7.55102041,   7.75510204,   7.95918367,
         8.16326531,   8.36734694,   8.57142857,   8.7755102 ,
         8.97959184,   9.18367347,   9.3877551 ,   9.59183673,
         9.79591837,  10.        ])

We can also mention the data type of elements to be returned:

In [10]:
np.linspace(0,10,4,dtype=int)
Out[10]:
array([ 0,  3,  6, 10])

eye

Returns an identity matrix of the given size.
Here only one parameter is required as the output is always a square matrix.

In [11]:
np.eye(4)
Out[11]:
array([[ 1.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  1.]])

Creating Random Arrays

These are specially useful for creating dummy data or initialising random data. NumPy has many methods to create arrays with random values:

rand

Creates an array of the specified shape and with random samples from a Uniform Distribution over [0, 1).

In [12]:
np.random.rand(2)
Out[12]:
array([ 0.56594261,  0.43410108])
In [13]:
np.random.rand(2,3)
Out[13]:
array([[ 0.07842831,  0.46672682,  0.25314248],
       [ 0.30500035,  0.00358737,  0.84292352]])

randn

Creates an array of the specified shape and with random samples from the Standard Normal Distribution of mean 0 and variance 1.

In [14]:
np.random.randn(2)
Out[14]:
array([ 1.20835183,  0.61915344])
In [15]:
np.random.randn(2,3)
Out[15]:
array([[-0.53824087, -0.968721  , -1.61452392],
       [ 1.02623892,  0.26075377, -1.87565154]])

randint

Returns one or more random integers from the given range.

In [16]:
np.random.randint(1,10) #the parameters are low (inclusive) and high (exclusive) of the range
Out[16]:
8
In [17]:
np.random.randint(1,100,10) #we can use the third parameter to specify the number of elements in the returned array
Out[17]:
array([93, 38, 25, 41, 42, 29, 90, 39, 93, 14])

Converting a Python List to NumPy Array

We can create a NumPy array by converting a python list or even list of lists.

Let’s start by creating a list.

In [18]:
some_list = [2,5,1,4,7]
some_list
Out[18]:
[2, 5, 1, 4, 7]

We can convert it into a NumPy array by simply calling out the ‘array’ function:

In [19]:
np.array(some_list)
Out[19]:
array([2, 5, 1, 4, 7])

If the list contains even one floating point value, all the elements of the resulting array will be converted into float.

In [20]:
another_list = [2,5,1,4,7.0]
np.array(another_list)
Out[20]:
array([ 2.,  5.,  1.,  4.,  7.])

To create a matrix, we can use a list of lists. Each individual inner list will represent one row of the resulting matrix.

In [21]:
some_matrix = [[1,2,3],[4,5,6],[7,8,9]]
some_matrix
Out[21]:
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
In [22]:
np.array(some_matrix)
Out[22]:
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

Array Attributes and Methods

There are many useful attributes and methods for a Numpy array.

Let’s start by building two arrays as follows:

In [27]:
my_array = np.arange(30)
random_array = np.random.randint(0,50,10)
In [28]:
my_array
Out[28]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])
In [29]:
random_array
Out[29]:
array([35, 34, 45,  3, 42, 32, 30, 32, 24,  7])

Reshape

Returns an array with the same data but of the specified new shape.

In [30]:
my_array.reshape(6,5)
Out[30]:
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29]])

It works only if the specified shape can accomodate all the elements of the original array.
If N is the number of elements, and (R,C) is the new shape, then the following condition should hold:
N = R * C

However, reshape allows you to skip one of the parameters (either the number of rows or the number of columns, but not both!).
To do this, you can just specify -1 as one of the arguments, and NumPy will determine the suitable value on its own.

In [31]:
my_array.reshape(3,-1)
Out[31]:
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])
In [32]:
my_array.reshape(-1,6)
Out[32]:
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29]])

max, min, argmax, argmin

These are used to return the maximum or minimum values, or their index locations.

In [40]:
random_array #the one we declared above
Out[40]:
array([35, 34, 45,  3, 42, 32, 30, 32, 24,  7])
In [41]:
random_array.max() #returns the largest element
Out[41]:
45
In [42]:
random_array.argmax() #returns the index value of the largest element 
Out[42]:
2
In [43]:
random_array.min() #returns the smallest element
Out[43]:
3
In [44]:
random_array.argmin() #returns the index value of the smallest element
Out[44]:
3

Shape

It returns the shape of the array. Please note that shape is just an attribute of the NumPy arrays; it is not a method.

In [32]:
my_array.shape #this is a vector, hence only one term is returned.
Out[32]:
(25,)

To convert it into a one-dimensional matrix, we will use the ‘reshape’ function:

In [47]:
my_array.reshape(1,30)
Out[47]:
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
        17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])
In [48]:
my_array.reshape(1,30).shape #this is a matrix
Out[48]:
(1, 30)

dtype

It is used to return the datatype of the array. This is also an attribute, not a method.

In [49]:
my_array.dtype
Out[49]:
dtype('int32')

Indexing and Selection

This allows us to select specific elements or groups of elements from an array.

In [50]:
my_array #declared above
Out[50]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])

Indexing and Slicing

It is used to get the element at a particular index or in a particular range. It is similar to the method in Python Lists.

In [51]:
my_array[10]
Out[51]:
10
In [53]:
my_array[1:5] #low inclusive and high exclusive
Out[53]:
array([1, 2, 3, 4])

We can skip the low value to indicate “start from the first element”, and the high value to indicate “go all the way till the end”:

In [67]:
my_array[:5]
Out[67]:
array([0, 1, 2, 3, 4])
In [70]:
my_array[24:]
Out[70]:
array([24, 25, 26, 27, 28, 29])
In [71]:
my_array[:]
Out[71]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])

Indexing and Slicing a Matrix

In [54]:
some_matrix #declared above
Out[54]:
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
In [55]:
np_matrix = np.array(some_matrix)
np_matrix
Out[55]:
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])
In [56]:
np_matrix[1] #gives specified row
Out[56]:
array([4, 5, 6])
In [57]:
np_matrix[1][2] #gives specified element: matrix[row][column]
Out[57]:
6
In [58]:
np_matrix[1,2] #same as previous: matrix[row,column]
Out[58]:
6
In [61]:
np_matrix[1:3] #slicing rows in matrix
Out[61]:
array([[4, 5, 6],
       [7, 8, 9]])
In [65]:
np_matrix[1:3,1:3] #slicing rows as well as columns in matrix
Out[65]:
array([[5, 6],
       [8, 9]])
In [72]:
np_matrix[:2,1:]
Out[72]:
array([[2, 3],
       [5, 6]])
In [73]:
random_array #declared above
Out[73]:
array([35, 34, 45,  3, 42, 32, 30, 32, 24,  7])
In [75]:
boolean_array = random_array > 20 #returns an aray with value=True where condition holds and value=False otherwise
boolean_array
Out[75]:
array([ True,  True,  True, False,  True,  True,  True,  True,  True, False], dtype=bool)
In [76]:
random_array[boolean_array] #returns only those elements for which the corresponding element in boolean_array is True
Out[76]:
array([35, 34, 45, 42, 32, 30, 32, 24])
In [77]:
random_array[random_array > 20] #same as previous
Out[77]:
array([35, 34, 45, 42, 32, 30, 32, 24])

NumPy Operations

There are many mathematical and universal operations which can be applied on NumPy arrays.

Let’s start by creating a new array:

In [78]:
new_array = np.arange(10)
new_array
Out[78]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [80]:
new_array + new_array #returns 'element by element' sum
Out[80]:
array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])
In [81]:
new_array - new_array #returns 'element by element' difference
Out[81]:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
In [82]:
new_array * new_array #returns 'element by element' product
Out[82]:
array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])
In [83]:
new_array / new_array #returns 'element by element' quotient
C:\Users\Pranav\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: RuntimeWarning: invalid value encountered in true_divide
  """Entry point for launching an IPython kernel.
Out[83]:
array([ nan,   1.,   1.,   1.,   1.,   1.,   1.,   1.,   1.,   1.])

The first element is 0/0, which is not allowed in mathematics. However, instead of giving an error, Python represents it as a nan (not a number)

In [58]:
new_array ** 2 #returns each element to the power 2
Out[58]:
array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])
In [59]:
1 / new_array #returns inverse of each element
C:\Users\Pranav\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: RuntimeWarning: divide by zero encountered in true_divide
  """Entry point for launching an IPython kernel.
Out[59]:
array([        inf,  1.        ,  0.5       ,  0.33333333,  0.25      ,
        0.2       ,  0.16666667,  0.14285714,  0.125     ,  0.11111111])

Here, the first element is 1/0, which represents infinity. Python indicates this value by ‘inf’.

In [84]:
np.sqrt(new_array) # same as new_array ** (1/2)
Out[84]:
array([ 0.        ,  1.        ,  1.41421356,  1.73205081,  2.        ,
        2.23606798,  2.44948974,  2.64575131,  2.82842712,  3.        ])
In [62]:
np.sin(new_array) #returns the sine value of each element
Out[62]:
array([ 0.        ,  0.84147098,  0.90929743,  0.14112001, -0.7568025 ,
       -0.95892427, -0.2794155 ,  0.6569866 ,  0.98935825,  0.41211849])

There are loads of other functions in NumPy, which are very handy and intended to make your lives a lot easier.
I strongly recommend going through the official documentation right here.
This article is just for the starters. 😉

If you have any doubt, feel free to comment below. Happy learning 🙂

Share this article!

Pranav Gupta

Pranav Gupta

Co-Founder at DataScribble
An always cheerful and optimistic guy, with a knack for achieving the set target at any cost.
I am an avid learner and never shy off from working hard or working till late. I am also a passionate reader, and love to read thriller novels, Jeffrey Archer being the favorite writer.
LinkedIn: https://www.linkedin.com/in/prnvg/
Pranav Gupta

Pranav Gupta

An always cheerful and optimistic guy, with a knack for achieving the set target at any cost. I am an avid learner and never shy off from working hard or working till late. I am also a passionate reader, and love to read thriller novels, Jeffrey Archer being the favorite writer. LinkedIn: https://www.linkedin.com/in/prnvg/

Leave a Reply

Your email address will not be published. Required fields are marked *