Building your own Neural Network — Understand the built of a Neural Network
Step by step implementation in Python only with NumPy
With this post, we will be starting the fifth chapter in which we will first understand how Neural Networks are built, and then in the following posts we will go through the Backpropagation, L1 and L2 regularization, Layer Normalization, Batch Training, and much more.
You can download the Jupyter Notebook from here.
Note — This post uses many things from the previous chapters. It is recommended that you have a look at the previous posts.
5.1 Forward Feed in ANNs
Forward feed simply means calculating loss or y_hat (y predicted).
But first, we must understand how to make a Neural Network.
Neural Networks stores information or general pattern in parameters called ‘Weights’ and ‘Biases’.
Neural Networks consist of ‘Layers’ and each layer has ‘Nodes’.
Each circle is a ‘Node’ and this collection of nodes is called a ‘Layer’.
Each node is represented by a scalar in Python and a collection of nodes in shape (-1, 1) forms a layer.
Note — We can also create a Neural Network where a layer is of shape (-1,) or (1, -1) but in this course, we will make layers in shape (-1, 1).
Neural Networks can have as many layers and nodes as you want.
For this course, We will have the following architecture, i.e., 4 layers with 5, 3, 5, and 4 nodes respectively.
The first layer is called the ‘Input Layer’
The last layer is called the ‘Output Layer’.
Layers in between them are called the ‘Hidden Layers’.
Layers are connected to each other via ‘Weights’.
Let us call the collection of weights ‘w1’, ‘w2’, and ‘w3’
And we call each weight by nodes which they are connecting.
For example, this weight is called ‘w3₁₂’ because it connects the first node from the previous layer to the second node in the next layer.
And each node in every layer except the input layer has a bias.
Let us call the collections of biases ‘b1’, ‘b2’, and ‘b3’.
And we call each bias by what node it is connected to.
For example, this bias is called ‘b3₄’ because it is connected to the fourth node.
Every hidden layer and the output layer is passed through an activation function.
And finally, we have a layer that represents true output and a loss function.
Let us name each layer one by one which will help us in making this Neural Network in Python.
First, we have the input Layer ‘x’. The elements will be x₁, x₂ …
Then we have the input of the hidden 1 layer calling it ‘in_hidden_1’ or ‘I_H1’. The elements will be ‘I_H1₁’, ‘I_H1₂’ …
Then an Activation layer for the first hidden layer.
Then we have the output of hidden layer 1 calling it ‘out_hidden_1’ or ‘O_H1’. The elements will be ‘O_H1₁’, ‘O_H1₂’ …
Then we have the input of the hidden 2 layer calling it ‘in_hidden_2’ or ‘I_H2’. The elements will be ‘I_H2₁’, ‘I_H2₂’ …
Then an Activation layer for the second hidden layer.
Then we have the output of hidden layer 2 calling it ‘out_hidden_2’ or ‘O_H2’. The elements will be ‘O_H2₁’, ‘O_H2₂’ …
Then we have the input for the output layer calling it ‘in_output_layer’ or ‘I_OL’. The elements will be ‘I_OL₁’, ‘I_OL₂’ …
Then an Activation layer for the output layer.
Then we have the ‘y_hat’ or ‘y predicted’ layer. The elements will be ‘y_hat₁’, ‘y_hat₂’ …
Then we have a loss function.
And finally, we have the true output layer calling it ‘y’. The elements will be ‘y₁’, ‘y₂’ …
Our Neural Network will look like this.
Now let us start a forward feed or let us calculate the loss.
Note — We can see that the shape of each layer is (-1, 1)
How will we do it? Simple, let us see each step one by one in Python.
But first a few things.
First, the activation function for the first hidden layer is the ‘ReLU activation function with leak = 0.1’.
Second, the activation function for the second hidden layer and the output layer is the ‘Sigmoid activation function’.
Third, the loss function used is ‘Mean Square Error’.
Can you calculate the total number of parameters, i.e., weights and biases? It is easy to come up with a formula.
So, the steps are:
Step 1 - Importing NumPy library and defining nodes in each layer
Step 2 - Inputs and true Outputs
Step 3 - Defining the Activation functions and the loss function
Step 4 - Random initialization of weights and zero initialization of
biases
Step 5 - Calculating outputs of each layer
Step 6 - Calculating the loss or error
Let us see each step one by one.
Step 1 — Importing NumPy library and defining nodes in each layer
import numpy as np # importing NumPy
np.random.seed(42)input_nodes = 5 # nodes in each layer
hidden_1_nodes = 3
hidden_2_nodes = 5
output_nodes = 4
Step 2 — Inputs and true Outputs
x = np.random.randint(1, 100, size = (input_nodes, 1)) / 100print('x') # Inputs
print(x, x.shape)y = np.random.randint(1, 100, size = (output_nodes, 1)) / 100print('y') # Outputs
print(y, y.shape)
Step 3 — Defining the Activation functions and the loss function
def relu(x, leak = 0): # ReLU
return np.where(x <= 0, leak * x, x)def sig(x): # Sigmoid
return 1/(1 + np.exp(-x))def mse(y_true, y_pred): # MSE
return np.mean((y_true - y_pred)**2)
Step 4 — Random initialization of weights and zero initialization of biases
Note — Weights are generally initialized between -1 and 1 such that the mean is 0 and the standard deviation is 1. But here we will initialize them randomly between 0 and 1. We will generate normally distributed weights in the last post of this chapter where we will talk about the UCI White Wine quality dataset. There are many other ways to initialize the weights and biases. You may refer to the literature available on the internet.
But first, we will understand what will be the shape of the weight tensors.
The shape of the weight tensor will be:
(nodes in the next layer, nodes in the previous layer)
So for weights w1, we will have a matrix of shape (3, 5)
We can see that the first column in the weight matrix represents the outgoing weights from the first node of the previous layer and the first row represents incoming weights to the first node in the next layer.
Let us take a look one more time.
The second column represents the outgoing weights from the second node in the previous layer.
The second row represents incoming weights to the second node in the next layer.
So, the shape of the weight matrix will be (nodes in the next layer, nodes in the previous layer) and the shape of biases will be (-1, 1) or (nodes in the layer, 1)
w1 = np.random.random(size = (hidden_1_nodes, input_nodes))
b1 = np.zeros(shape = (hidden_1_nodes, 1))print('w1') # w1
print(w1, w1.shape)
print('\n')
print('b1') # b1
print(b1, b1.shape)w2 = np.random.random(size = (hidden_2_nodes, hidden_1_nodes))
b2 = np.zeros(shape = (hidden_2_nodes, 1))print('w2') # w2
print(w2, w2.shape)
print('\n')
print('b2') # b2
print(b2, b2.shape)w3 = np.random.random(size = (output_nodes, hidden_2_nodes))
b3 = np.zeros(shape = (output_nodes, 1))print('w3') # w3
print(w3, w3.shape)
print('\n')
print('b3') # b3
print(b3, b3.shape)
Step 5 — Calculating outputs of each layer
5.1 Calculating ‘in_hidden_1’ or ‘I_H1’
To calculate outputs of each layer, we take a weighted sum like this
We can use matrix multiplication to reduce it so that we can implement it in Python like this
in_hidden_1 = w1.dot(x) + b1print('in_hidden_1')
print(in_hidden_1, in_hidden_1.shape)
5.2 Calculating ‘out_hidden_1’ or ‘O_H1’
We can calculate ‘O_H1’ by passing it through the ReLU activation function with leak = 0.1
out_hidden_1 = relu(in_hidden_1, leak = 0.1)print('out_hidden_1')
print(out_hidden_1, out_hidden_1.shape)
5.3 Calculating ‘in_hidden_2’ or ‘I_H2’
Or, we can do
in_hidden_2 = w2.dot(out_hidden_1) + b2print('in_hidden_2')
print(in_hidden_2, in_hidden_2.shape)
5.4 Calculating ‘out_hidden_2’ or ‘O_H2’
out_hidden_2 = sig(in_hidden_2)print('out_hidden_2')
print(out_hidden_2, out_hidden_2.shape)
5.5 Calculating ‘in_output_layer’ or ‘I_OL’
or, we can do this
in_output_layer = w3.dot(out_hidden_2) + b3print('in_output_layer')
print(in_output_layer, in_output_layer.shape)
5.6 Calculating ‘y_hat’
y_hat = sig(in_output_layer)print('y_hat')
print(y_hat, y_hat.shape)
We can print true values or ‘y’ to see how different the predicted values are from true values
print('y')
print(y, y.shape)
Step 6 — Calculating MSE loss or error
mse(y, y_hat)
So, I hope now you understand how to build your own Neural Network.
I am not using Object-Oriented programming because I want to keep things simple for this course.
In the next two posts, we will see how to use Backpropagation to calculate gradients which will help us to minimize the loss by updating weights and biases with the help of some Optimizer.
If you are looking for Batch training then you have to wait for the sixth post of this chapter.
If you like this post then please subscribe to my youtube channel neuralthreads and join me on Reddit.
I will be uploading new interactive videos soon on the youtube channel. And I will be happy to help you with any doubt on Reddit.
Many thanks for your support and feedback.
If you like this course, then you can support me at
It would mean a lot to me.
Continue to the next post — 5.2.1 Backpropagation in ANNs — Part 1.