Basic and useful NumPy operations for the course
Which will be very helpful throughout the course
1.3 Basic and useful NumPy operations for the course
You can download the Jupyter Notebook from here.
In this post, we will see some useful NumPy operations which we will use in future posts for this course. Important Operations like broadcasting and taking sum along a particular axis are explained here.
Starting with importing NumPy and setting the random seed at 42.
import numpy as np
np.random.seed(42)
np.random.random gives an array of random numbers between 0 and 1 of a given shape.
np.random.random(size = (2, 3))
np.random.normal gives an array of normally distributed numbers having mean = ‘loc’ and standard deviation = ‘scale’ of given shape.
np.random.normal(loc = 0, scale = 1, size = (2, 3))
np.random.randint gives an array of random integers between low and high (exclusive) of a given shape
np.random.randint(low = 0, high = 10, size = (3, 4))
np.ones give an array of ones of a given shape
np.ones(shape = (4, 5))
np.zeros give an array of zeros of a given shape
np.zeros(shape = (2, 3))
np.eye gives an identity matrix of a given shape, square matrix
np.eye(N = 3)
np.T will take the transpose of a matrix or it will flip it along the diagonal
It reverses the shape tuple and also changes the order of the scalar elements
x = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
print(x)x.shapex.Tx.T.shape
np.concatenate will merge two or more arrays along the specified axis
The only requirement is all other axis should be same
If one array is of shape (3, 2, 4) and the other is of shape (3, 1, 4)
Then they can be merged along axis = 1
Or, if one array is of shape (1, 2, 3) and the other is of shape (2, 2, 3)
Then they can be merged along axis = 0
np.concatenate is different from np.stack which we will see in chapter 6 when we will talk about Images as NumPy tensors/arrays
x = np.random.random(size = (3, 2, 4))
print(x, x.shape)y = np.random.random(size = (3, 1, 4))
print(y, y.shape)z = np.concatenate((x, y), axis = 1)
print(z, z.shape)
They are merged along axis = 1 in order x and y. We can also do it in order y and x.
Now let us talk about np.sum, we will use it in chapter 5 when we will talk about Backpropagation.
Note — Everything is the same for np.mean, np.max, and np.min what we are seeing for np.sum
np.sum will give us sum of all the scalars in the array
x = [i for i in range(1, 25)]
x = np.array(x).reshape((4, 3, 2))
print(x, x.shape)np.sum(x)
Sum is (24 * 25) / 2 = 300
But sometimes we have to take the sum along a particular axis.
In this case, we have 3 axes. So, let us go through each axis one by one.
When we take the sum along axis = 0
It will actually take the sum of N-1 D arrays element-wise and will return an array whose shape tuple will not have the first entry.
sum_0 = np.sum(x, axis = 0)
print(sum_0, sum_0.shape)
All the matrices are added element-wise and the resulting shape is (3, 2)
We can see that the first entry is discarded from the shape (4, 3, 2)
When we take the sum along axis = 1
It will actually take the sum of N-2 D arrays element-wise in all N-1 D arrays and will return an array whose shape tuple will not have the second entry.
Same for all other 2-D arrays and the resulting shape is (4, 2)
We can see that the second entry is discarded from the shape (4, 3, 2)
When we take the sum along axis = 2
It will actually take the sum of N-3 D arrays element-wise in all N-2 D arrays and will return an array whose shape tuple will not have the third entry.
In this case, it will take the sum of the scalars in all of the vectors.
sum_2 = np.sum(x, axis = 2)
print(sum_2, sum_2.shape)
Same for all other 1-D arrays and the resulting shape is (4, 3)
We can see that the third entry is discarded from the shape (4, 3, 2)
Note — We can also use axis = -1 in this case.
Suppose you have a matrix
Then taking the sum along axis = 0 is equivalent to taking sum along the columns and taking the sum along axis = 1 is equivalent to taking sum along the rows. We will use this very much in Backpropagation in chapter 5.
x = np.array([[1, 2, 3], [4, 5, 6]])
print(x, x.shape)sum_0 = np.sum(x, axis = 0) # sum along columns
print(sum_0, sum_0.shape)sum_1 = np.sum(x, axis = 1) # sum along rows
print(sum_1, sum_1.shape)
Sum along the columns.
Sum along the rows.
Now let us talk about Broadcasting. The most important topic in this post.
First, we have an array ‘x’
print(x, x.shape)
When we perform any operation like multiplication, addition, subtraction or division between an array and a scalar, then scalar is broadcasted to every element in the array
x * 5
5 is multiplied with every element in the array ‘x’
The same is true for addition, subtraction, and division.
Broadcasting a function
In python, if we define a function and pass an array as an argument, then the function is broadcasted to every element in the array.
Note — Things will be different if the function uses the shape of the array which we will talk about in the Softmax activation function.
Now let us talk about broadcasting between an array and a smaller dimension array.
There are some requirements when we want to perform some operation between an array and a smaller dimension array. Let us understand with examples.
print(x, x.shape)
First example
y = np.array([1, 2])
print(y, y.shape)x + y
We can see that here ‘y’ [1, 2] is added to every 1-S tensor in ‘x’
Second example,
y = np.array([[1, 2], [3, 4], [5, 6]])
print(y, y.shape)x + y
We can see that ‘y’ is added to every 2-D tensor in ‘x’
Third example,
y = np.array([[1], [3], [5]])
print(y, y.shape)x - y
We can see that ‘y’ is subtracted from every 2-D column vector in ‘x’
Fourth example,
y = np.array([[[1], [3], [5]], [[7], [9], [11]], [[13], [15], [17]], [[19], [21], [23]]])
print(y, y.shape)x / y
We can see that ‘y’ divides every 3-D column vector in ‘x’
Note — If two or more arrays have the same dimension, then the operation will be element-wise.
And at last, let us see the DOT product.
It is very easy to take DOT products in NumPy.
x = np.array([[1], [2], [3]])
print(x, x.shape)y = np.array([[1, 2, 3]])
print(y, y.shape)z = x .dot( y )
print(z, z.shape)
With this post, the first chapter is over. In the next post, we will start Chapter 2 — Optimizers with Gradient Descent.
Watch the video on youtube and subscribe to the channel for videos and posts like this.
Every slide is 3 seconds long and without sound. You may pause the video whenever you like.
You may put on some music too if you like.
The video is basically everything in the post only in slides.
Many thanks for your support and feedback.
If you like this course, then you can support me at
It would mean a lot to me.
Continue to the next post — 2.1 SGD or Stochastic Gradient Descent.