I was not satisfied by any Deep Learning tutorials online.
So, I created my own tutorials.
The beginning of my journey
I was fascinated by Neural Networks when I first encountered them. Then the first thing I did was a google search for Deep Learning tutorials. And I found many, many tutorials. It took me 2 days to learn Keras, DL libraries have made making Neural Networks very easy. But I wanted to go into the mathematics of NNs. I started looking for content that was related to the mathematics of NNs.
Every time if I look into any DL paper, I was terrified of mathematical notations which, frankly are hard for me. I kept searching for tutorials that can make NNs mathematics easy for me. But all I got was DL tutorials of Keras, PyTorch, etc. The tutorials are good but every time Backpropagation comes or Normalization or Optimizers like Adam or RMSprop or Adamax or L1 and L2 penalties, they only show the equations, and that's it. Many of them were simple copy and paste of Keras documentation for these. Luckily if I come across some video or article that deals with maths, it was not satisfactory. There were mistakes or some points were left behind. And the most important part, the codes were messy.
So, I started from the very basics. I remember my first NN. It was a single node input and single node output without any activation or bias. I know you are laughing but this was how I understood SGD. Then it was 3 inputs and 1 output without activation or bias. I wrote 3 equations for each weight. After that 1 input and 3 outputs without activation or bias. And at last, it was 3 inputs and 3 outputs and I wrote 9 equations for SGD. It took me another 5 minutes of staring into the screen to write a single line of code that uses the matrix method to update the weights. After that Activation and biases were used. Then Hidden layers, SGD with Momentum, other Optimizers, most important the Softmax derivative or jacobian with Categorical cross-entropy error were implemented. After these implementations, I developed a little confidence. Then L1 and L2 penalties, Dropout, Batch training, and Validation sets, Multiple Inputs and Multiple Outputs, Layer Normalization was implemented, and then CNNs and many other things were implemented. And that is how I made this course.
Who should see this course? And my motive to make this course.
I have created this course for those people who want to create NNs from scratch. If you want to learn more than the DL libraries, then this course is for you.
I have created this course so that you don’t have to spend time online looking for materials related to the mathematics of NNs. It took me a lot of time to go through everything but I don’t want you to spend your time searching because in the end, either you will just leave the thought of going deep into the math, or like me, you will also start from the basics.
Prerequisites for the course and the structure
In this course, we will not use Object-oriented programming or any in-built library, except NumPy.
You only need to know:
1. Basic Python, up to functions
2. Basic Calculus, up to chain rule
The course is divided into several Chapters
Chapter 1 — Tensors
1.1 What are Tensors? (Tensors as NumPy arrays)
1.2 What is Tensor reshaping?
1.3 Some useful NumPy operations
Chapter 2 — Optimizers
2.1 SGD
2.2 SGD with Momentum
2.3 SGD with Nesterov acceleration
2.4 Adagrad
2.5 RMSprop
2.6 Adadelta
2.7 Adam
2.8 Amsgrad
2.9 Adamax
2.10 Optimizers racing to the Minima
Chapter 3 — Activation functions and their derivatives
3.1 Sigmoid
3.2 Tanh
3.3 Softsign
3.4 ReLU and Leaky ReLU
3.5 SELU and ELU
3.6 Softplus
3.7 Softmax
Chapter 4 — Losses and their derivatives
4.1 Mean Square Error
4.2 Mean Absolute Error
4.3 Categorical cross-entropy
4.4 Binary cross-entropy
Chapter 5 — Diving Deep in the Neural Networks
5.1 Forward feed in ANNS
5.2.1 Backpropagation in ANNs part I
5.2.2 Backpropagation in ANNs part II
5.3 L1 and L2 regularization
5.4 Dropout
5.5.1 Layer Normalization part I
5.5.2 Layer Normalization part II
5.6 Batch Training
5.7 Validation Set
5.8 Multiple Inputs and Multiple Outputs
5.9 UCI White Wine quality, Accuracy, Confusion Matrix
Chapter 6 — CNNs and much more (in progress)
I will update the post once the videos are finished.
I have kept the language very simple and the codes are very neat.
You can download the Jupyter Notebook from the link provided in every post.
Watch the video on youtube.
Every slide is 3 seconds long and without sound. You may pause the video whenever you like.
You may put on some music too if you like.
Many thanks for your support and feedback
If you like this course, then you can support me at
It would mean a lot to me. Many thanks.