Building your own Neural Network — Understand the built of a Neural Network

Step by step implementation in Python only with NumPy

10 min readDec 10, 2021

With this post, we will be starting the fifth chapter in which we will first understand how Neural Networks are built, and then in the following posts we will go through the Backpropagation, L1 and L2 regularization, Layer Normalization, Batch Training, and much more.

You can download the Jupyter Notebook from here.

Note — This post uses many things from the previous chapters. It is recommended that you have a look at the previous posts.

Back to the previous post

Back to the first post

5.1 Forward Feed in ANNs

Forward feed simply means calculating loss or y_hat (y predicted).

But first, we must understand how to make a Neural Network.

Neural Networks stores information or general pattern in parameters called ‘Weights’ and ‘Biases’.

Neural Networks consist of ‘Layers’ and each layer has ‘Nodes’.

Each circle is a ‘Node’ and this collection of nodes is called a ‘Layer’.

Each node is represented by a scalar in Python and a collection of nodes in shape (-1, 1) forms a layer.

Note — We can also create a Neural Network where a layer is of shape (-1,) or (1, -1) but in this course, we will make layers in shape (-1, 1).

Neural Networks can have as many layers and nodes as you want.

For this course, We will have the following architecture, i.e., 4 layers with 5, 3, 5, and 4 nodes respectively.

4 Layers with 5, 3, 5, and 4 Nodes for the Neural Network

The first layer is called the ‘Input Layer’

The last layer is called the ‘Output Layer’.

Layers in between them are called the ‘Hidden Layers’.

Layers are connected to each other via ‘Weights’.

Let us call the collection of weights ‘w1’, ‘w2’, and ‘w3’

And we call each weight by nodes which they are connecting.
For example, this weight is called ‘w3₁₂’ because it connects the first node from the previous layer to the second node in the next layer.

And each node in every layer except the input layer has a bias.
Let us call the collections of biases ‘b1’, ‘b2’, and ‘b3’.

And we call each bias by what node it is connected to.
For example, this bias is called ‘b3₄’ because it is connected to the fourth node.

Every hidden layer and the output layer is passed through an activation function.

And finally, we have a layer that represents true output and a loss function.

Let us name each layer one by one which will help us in making this Neural Network in Python.

First, we have the input Layer ‘x’. The elements will be x₁, x₂ …

Then we have the input of the hidden 1 layer calling it ‘in_hidden_1’ or ‘I_H1’. The elements will be ‘I_H1₁’, ‘I_H1₂’ …

in_hidden_1 layer ‘I_H1’ with elements ‘I_H1₁’, ‘I_H1₂’ …

Then an Activation layer for the first hidden layer.

Then we have the output of hidden layer 1 calling it ‘out_hidden_1’ or ‘O_H1’. The elements will be ‘O_H1₁’, ‘O_H1₂’ …

out_hidden_1 layer ‘O_H1’ with elements ‘O_H1₁’, ‘O_H1₂’ …

Then we have the input of the hidden 2 layer calling it ‘in_hidden_2’ or ‘I_H2’. The elements will be ‘I_H2₁’, ‘I_H2₂’ …

in_hidden_2 layer ‘I_H2’ with elements ‘I_H2₁’, ‘I_H2₂’ …

Then an Activation layer for the second hidden layer.

Then we have the output of hidden layer 2 calling it ‘out_hidden_2’ or ‘O_H2’. The elements will be ‘O_H2₁’, ‘O_H2₂’ …

out_hidden_2 layer ‘O_H2’ with elements ‘O_H2₁’, ‘O_H2₂’ …

Then we have the input for the output layer calling it ‘in_output_layer’ or ‘I_OL’. The elements will be ‘I_OL₁’, ‘I_OL₂’ …

in_ouput_layer ‘I_OL’ with elements ‘I_OL₁’, ‘I_OL₂’ …

Then an Activation layer for the output layer.

Then we have the ‘y_hat’ or ‘y predicted’ layer. The elements will be ‘y_hat₁’, ‘y_hat₂’ …

‘y_hat’ or ‘y predicted’ layer with elements ‘y_hat₁’, ‘y_hat₂’ …

Then we have a loss function.

And finally, we have the true output layer calling it ‘y’. The elements will be ‘y₁’, ‘y₂’ …

true output layer ‘y’ with elements ‘y₁’, ‘y₂’ …

Our Neural Network will look like this.

Now let us start a forward feed or let us calculate the loss.

Note — We can see that the shape of each layer is (-1, 1)

How will we do it? Simple, let us see each step one by one in Python.

But first a few things.
First, the activation function for the first hidden layer is the ‘ReLU activation function with leak = 0.1’.
Second, the activation function for the second hidden layer and the output layer is the ‘Sigmoid activation function’.
Third, the loss function used is ‘Mean Square Error’.

Can you calculate the total number of parameters, i.e., weights and biases? It is easy to come up with a formula.

So, the steps are:

Step 1 - Importing NumPy library and defining nodes in each layer
Step 2 - Inputs and true Outputs
Step 3 - Defining the Activation functions and the loss function
Step 4 - Random initialization of weights and zero initialization of
         biases
Step 5 - Calculating outputs of each layer
Step 6 - Calculating the loss or error

Let us see each step one by one.

Step 1 — Importing NumPy library and defining nodes in each layer

import numpy as np                          # importing NumPy
np.random.seed(42)input_nodes = 5                             # nodes in each layer
hidden_1_nodes = 3
hidden_2_nodes = 5
output_nodes = 4

Importing NumPy and defining nodes in each layer

Step 2 — Inputs and true Outputs

x = np.random.randint(1, 100, size = (input_nodes, 1)) / 100print('x')                                  # Inputs
print(x, x.shape)y = np.random.randint(1, 100, size = (output_nodes, 1)) / 100print('y')                                  # Outputs
print(y, y.shape)

Step 3 — Defining the Activation functions and the loss function

def relu(x, leak = 0):                      # ReLU
    return np.where(x <= 0, leak * x, x)def sig(x):                                 # Sigmoid
    return 1/(1 + np.exp(-x))def mse(y_true, y_pred):                    # MSE
    return np.mean((y_true - y_pred)**2)

Step 4 — Random initialization of weights and zero initialization of biases

Note — Weights are generally initialized between -1 and 1 such that the mean is 0 and the standard deviation is 1. But here we will initialize them randomly between 0 and 1. We will generate normally distributed weights in the last post of this chapter where we will talk about the UCI White Wine quality dataset. There are many other ways to initialize the weights and biases. You may refer to the literature available on the internet.

But first, we will understand what will be the shape of the weight tensors.

The shape of the weight tensor will be:
(nodes in the next layer, nodes in the previous layer)

So for weights w1, we will have a matrix of shape (3, 5)

We can see that the first column in the weight matrix represents the outgoing weights from the first node of the previous layer and the first row represents incoming weights to the first node in the next layer.

Let us take a look one more time.

The second column represents the outgoing weights from the second node in the previous layer.

The second row represents incoming weights to the second node in the next layer.

So, the shape of the weight matrix will be (nodes in the next layer, nodes in the previous layer) and the shape of biases will be (-1, 1) or (nodes in the layer, 1)

w1 = np.random.random(size = (hidden_1_nodes, input_nodes))
b1 = np.zeros(shape = (hidden_1_nodes, 1))print('w1')                                 # w1
print(w1, w1.shape)
print('\n')
print('b1')                                 # b1
print(b1, b1.shape)w2 = np.random.random(size = (hidden_2_nodes, hidden_1_nodes))
b2 = np.zeros(shape = (hidden_2_nodes, 1))print('w2')                                 # w2
print(w2, w2.shape)
print('\n')
print('b2')                                 # b2
print(b2, b2.shape)w3 = np.random.random(size = (output_nodes, hidden_2_nodes))
b3 = np.zeros(shape = (output_nodes, 1))print('w3')                                 # w3
print(w3, w3.shape)
print('\n')
print('b3')                                 # b3
print(b3, b3.shape)

Step 5 — Calculating outputs of each layer

5.1 Calculating ‘in_hidden_1’ or ‘I_H1’

To calculate outputs of each layer, we take a weighted sum like this

We can use matrix multiplication to reduce it so that we can implement it in Python like this

in_hidden_1 = w1.dot(x) + b1print('in_hidden_1')
print(in_hidden_1, in_hidden_1.shape)

5.2 Calculating ‘out_hidden_1’ or ‘O_H1’

We can calculate ‘O_H1’ by passing it through the ReLU activation function with leak = 0.1

out_hidden_1 = relu(in_hidden_1, leak = 0.1)print('out_hidden_1')
print(out_hidden_1, out_hidden_1.shape)

5.3 Calculating ‘in_hidden_2’ or ‘I_H2’

Or, we can do

in_hidden_2 = w2.dot(out_hidden_1) + b2print('in_hidden_2')
print(in_hidden_2, in_hidden_2.shape)

5.4 Calculating ‘out_hidden_2’ or ‘O_H2’

out_hidden_2 = sig(in_hidden_2)print('out_hidden_2')
print(out_hidden_2, out_hidden_2.shape)

5.5 Calculating ‘in_output_layer’ or ‘I_OL’

or, we can do this

in_output_layer = w3.dot(out_hidden_2) + b3print('in_output_layer')
print(in_output_layer, in_output_layer.shape)

5.6 Calculating ‘y_hat’

y_hat = sig(in_output_layer)print('y_hat')
print(y_hat, y_hat.shape)

We can print true values or ‘y’ to see how different the predicted values are from true values

print('y')
print(y, y.shape)

Step 6 — Calculating MSE loss or error

mse(y, y_hat)

So, I hope now you understand how to build your own Neural Network.

I am not using Object-Oriented programming because I want to keep things simple for this course.

In the next two posts, we will see how to use Backpropagation to calculate gradients which will help us to minimize the loss by updating weights and biases with the help of some Optimizer.

If you are looking for Batch training then you have to wait for the sixth post of this chapter.

If you like this post then please subscribe to my youtube channel neuralthreads and join me on Reddit.

I will be uploading new interactive videos soon on the youtube channel. And I will be happy to help you with any doubt on Reddit.