RMSprop — widely used Optimizer that is not yet published

Step by step implementation with animation for better understanding.

neuralthreads
6 min readNov 27, 2021

Back to the previous post

Back to the first post

2.5 How does RMSprop works?

RMSprop stands for Root Mean Square Propagation. It was proposed by Geoffrey Hinton. It solves the problem that arises in Adagrad, i.e., the square of gradients gets accumulated over time making the accumulator big which slows the learning. The idea is simple which is used in RMSprop, we will collect square of gradients but in a restricted manner.

And, we will calculate the update as follow:

You can download the Jupyter Notebook from here.

Note — It is recommended that you have a look at the first post in this chapter.

This post is divided into 3 sections.

  1. RMSprop in 1 variable
  2. RMSprop animation for 1 variable
  3. RMSprop in multi-variable function

RMSprop in 1 variable

In this method, we store the square of gradients in a restricted manner and we initialize accumulator = 0. The rest is the same as what we did in Adagrad.

RMSprop algorithm in simple language is as follows:

Step 1 - Set starting point and learning rate
Step 2 - Initiate accumulator = 0 and set rho = 0.9 and
epsilon = 10**-8
Step 3 - Initiate loop
Step 3.1 - calculate accumulator = rho * accumulator +
(1 - rho) * gradient**2
Step 3.2 - calculate update as stated above
Step 3.3 - add update to point

First, let us define the function and its derivative and we start from x = -1

import numpy as np
np.random.seed(42)
def f(x): # function definition
return x - x**3
def fdash(x): # function derivative definition
return 1 - 3*(x**2)
Defining the function and its derivative

And now RMSprop

point = -1                                   # step 1
learning_rate = 0.01
rho = 0.9 # step 2
accumulator = 0
epsilon = 10**-8
for i in range(1000): # step 3
accumulator = rho * accumulator + (1 - rho) * fdash(point)**2
# step 3.1
update = - learning_rate * fdash(point) / (accumulator**0.5 +
epsilon)
# step 3.2
point += update # step 3.3

point # Minima
Steps involved in RMSprop

RMSprop implementation in Python is a success.

RMSprop animation for better understanding

Everything thing is the same as what we did earlier for the animation of the previous 4 optimizers. We will create a list to store starting point and updated points in it and will use the iᵗʰ index value for iᵗʰ frame of the animation.

import matplotlib.pyplot as plt 
import matplotlib.animation as animation
from matplotlib.animation import PillowWriter
point_rmsprop = [-1] # initiating list with
# starting point in it
point = -1 # step 1
learning_rate = 0.01
rho = 0.9 # step 2
accumulator = 0
epsilon = 10**-8
for i in range(1000): # step 3
accumulator = rho * accumulator + (1 - rho) * fdash(point)**2
# step 3.1
update = - learning_rate * fdash(point) / (accumulator**0.5 +
epsilon)
# step 3.2
point += update # step 3.3

point_rmsprop.append(point) # adding updated point to
# the list

point # Minima
Importing libraries and creating a list that has the starting point and updated points in the list

We will do some settings for our graph for the animation. You can change them if you want something different.

plt.rcParams.update({'font.size': 22})fig = plt.figure(dpi = 100)fig.set_figheight(10.80)
fig.set_figwidth(19.20)
x_ = np.linspace(-5, 5, 10000)
y_ = f(x_)
ax = plt.axes()
ax.plot(x_, y_)
ax.grid(alpha = 0.5)
ax.set_xlim(-5, 5)
ax.set_ylim(-5, 5)
ax.set_xlabel('x')
ax.set_ylabel('y', rotation = 0)
ax.scatter(-1, f(-1), color = 'red')
ax.hlines(f(-0.5773502691896256), -5, 5, linestyles = 'dashed', alpha = 0.5)
ax.set_title('RMSprop, learning_rate = 0.01')
Few settings for our graph in the animation
The very first frame of the animation

Now we will animate the RMSprop optimizer.

def animate(i):
ax.clear()
ax.plot(x_, y_)
ax.grid(alpha = 0.5)
ax.set_xlim(-5, 5)
ax.set_ylim(-5, 5)
ax.set_xlabel('x')
ax.set_ylabel('y', rotation = 0)
ax.hlines(f(-0.5773502691896256), -5, 5, linestyles = 'dashed', alpha = 0.5)
ax.set_title('RMSprop, learning_rate = 0.01')

ax.scatter(point_rmsprop[i], f(point_rmsprop[i]), color = 'red')

The last line in the code snippet above is using the iᵗʰ index value from the list for iᵗʰ frame in the animation.

anim = animation.FuncAnimation(fig, animate, frames = 200, interval = 20)anim.save('2.5 RMSprop.gif')

We are creating an animation that only has 200 frames and the gif is at 50 fps or frame interval is 20 ms.

It is to be noted that in less than 200 iterations we have reached the minima.

RMSprop Animation
RMSprop Animation

RMSprop in multi-variable function (2 variables right now)

Everything is the same, we only have to initialize point (1, 0) and accumulator = 0 but with shape (2, 1) and replace fdash(point) with gradient(point).

But first, let us define the function, its partial derivatives and, gradient array

We know that Minima for this function is at (2, -1)
and we will start from (1, 0)

The partial derivatives are

def f(x, y):                                    # function
return 2*(x**2) + 2*x*y + 2*(y**2) - 6*x # definition
def fdash_x(x, y): # partial derivative
return 4*x + 2*y - 6 # w.r.t x
def fdash_y(x, y): # partial derivative
return 2*x + 4*y # w.r.t y
def gradient(point):
return np.array([[ fdash_x(point[0][0], point[1][0]) ],
[ fdash_y(point[0][0], point[1][0]) ]], dtype = np.float64) # gradients
Defining the function, its partial derivatives, and gradient array

Now the steps for RMSprop in 2 variables are

point = np.array([[   1   ],                        # step 1
[ 0 ]], dtype = np.float32)
learning_rate = 0.01rho = 0.9 # step 2
accumulator = np.array([[ 0 ],
[ 0 ]], dtype = np.float32)
epsilon = 10**-8
for i in range(1000): # step 3
accumulator = rho * accumulator + (1 - rho) * gradient(point)**2
# step 3.1
update = - learning_rate * gradient(point) / (accumulator**0.5 +
epsilon)
# step 3.2
point += update # step 3.3

point # Minima
Steps involved in RMSprop for 2 variable function

I hope now you understand RMSprop.

Watch the video on youtube and subscribe to the channel for videos and posts like this.
Every slide is 3 seconds long and without sound. You may pause the video whenever you like.
You may put on some music too if you like.

The video is basically everything in the post only in slides.

Many thanks for your support and feedback.

If you like this course, then you can support me at

It would mean a lot to me.

Continue to the next post — 2.6 Adadelta.

--

--

No responses yet