SELU and ELU — Exponential Linear Units
Step by step implementation with their derivatives
In this post, we will talk about the SELU and ELU activation functions and their derivatives. SELU stands for Scaled Exponential Linear Unit and ELU stands for Exponential Linear Units. In this function, we use negative values in a restricted manner.
You can download the Jupyter Notebook from here.
3.5 What are the SELU and ELU activation functions and their derivatives?
This is the definition of the ELU function. a is alpha.
And it is very easy to find the derivative of the ELU function.
This is the definition of the SELU function. a is alpha and s is scale.
And it is very easy to find the derivative of the SELU function.
This is the graph for the ELU and Leaky SELU functions and their derivatives.
Note — We can see that when scale = 1, SELU is simply ELU.
We can easily implement the SELU and ELU functions in Python.
Note — We are implementing SELU and ELU in the same function because when scale = 1, SELU is simple ELU
import numpy as np # importing NumPy
np.random.seed(42)def selu(x, alpha = 1, scale = 1): # SELU and ELU
return np.where(x <= 0, scale * alpha * (np.exp(x) - 1), scale *
x)def selu_dash(x, alpha = 1, scale = 1): # SELU and ELU derivative
return np.where(x <= 0, scale * alpha * np.exp(x), scale)
Let us have a look at an example
x = np.array([[0.11], [-2.2], [0], [50.2], [33.5], [-0.6]])
xselu(x)selu_dash(x)
I hope now you understand how to implement the SELU and ELU functions and their derivatives.
There are many more variants of ReLU like thresholded ReLU and GeLU. You may refer to the literature available on the internet for more.
Watch the video on youtube and subscribe to the channel for videos and posts like this.
Every slide is 3 seconds long and without sound. You may pause the video whenever you like.
You may put on some music too if you like.
The video is basically everything in the post only in slides.
Many thanks for your support and feedback.
If you like this course, then you can support me at
It would mean a lot to me.
Continue to the next post — 3.6 Softplus Activation functions and its derivative.