Trying Adam Optimizer from Scratch in Python

almajid.dev - Ever felt so exhausted while learning optimization algorithms that your head starts overheating like a CPU running gradient descent? Yeah... me too.

But as the old Chinese proverb says "不怕慢, 只怕站" which meaning "don't be afraid to growing slowly, be afraid only of standing still".

When learning about the Adam Optimizer^[1] and writing this post, it sometimes feel like hard, repetitive and tired. But we need to remind ourselves the great part isn't when we make it, but it's in the journey to make it.

In case you missed it, this is the next chapter from my learning on Understanding Adam Optimizer. In this post we will create a simple case for using Adam Optimizer to finding the global minimum from three-dimensional quadratic function.

Preparation

Before running Adam Optimizer in Python, we need to prepare our development environment with the right compiler and libraries. In this case, we will use Microsoft Visual Studio Code^[2] as our Python IDE.

Microsoft Visual Studio Code

But we can use any alternatives like Spyder^[3], which offers a more user-friendly interface with built-in scientific tools.

Spyder IDE

Next, make sure your computer has installed Python. If your computer not yet installed you can read our tutorial in How to Install Python on Windows 11. For the next step we will using NumPy library for doing numerical operations and Matplotlib library for plotting.

To install these libraries, we can open terminal or command prompt and run:

pip install numpy==1.26.4 matplotlib==3.10.3

Notes:
Why NumPy version is using 1.26.4 even though NumPy have 2.0.0 series newer version? because it will incompatible with Matplotlib, you can downgrade with that's command.

Objective Function

In this case we will try to use Adam from scratch and write it in Python, and we will use it to optimize a simple objective function. As we said before, the main goal with Adam is we are trying to find minima point from the objective function.

Here’s the two-variable quadratic objective function we’ll optimize:

f(x, y) = x² + 2y² + 5x + 12y + 22

Next, we’ll define both the objective function and its gradient (the partial derivatives with respect to x and y). Why do we need the gradient? Because Adam is a gradient-based optimization algorithm^[1] meaning it relies on the gradient of the objective function to determine the direction to move at each step.

Write Objective function:

def obj(x, y):
    return x**2 + 2 * y**2 + 5 * x + 12 * y + 22

Write Derivative function:

def der(x, y):
    return np.array([2 * x + 5, 4 * y + 12])

Objective Function Plot View

Before we run the optimizer, let's go trying to visualize the shape of the objective function. Visualizing the shape of the objective function will help us understand what the objective function looks like and where its minimum points are.

Here is the code for viewing the objective function, and we will named the file code plotview.py :

import numpy as np
import matplotlib.pyplot as plt

# obj func
def obj(x, y):
    return x**2 + 2 * y**2 + 5 * x + 12 * y + 22

# der func
def der(x, y):
    return np.array([2 * x + 5, 4 * y + 12])

xAxis = np.arange(-12.5, 7.5, 0.01)
yAxis = np.arange(-13, 7, 0.01)
x, y = np.meshgrid(xAxis, yAxis)
z = obj(x, y)

# create a 3D plot
fig = plt.figure()
ax = fig.add_subplot(projection="3d")
surface = ax.plot_surface(x, y, z, cmap="ocean", alpha=0.5)

ax.set_title("3D Surface Plot of the Objective Function")
ax.set_xlabel("X-axis")
ax.set_ylabel("Y-axis")
ax.set_zlabel("Z-axis (Function Value)")

plt.show()

When we run plotview.py, it will generates a 3D surface plot of the objective function and here are the results.

plotview.py

Explanation plotview.py

Write Grid values:

xAxis = np.arange(-12.5, 7.5, 0.01)
yAxis = np.arange(-13, 7, 0.01)
X, Y = np.meshgrid(xAxis, yAxis)
Z = obj(X, Y)

For grid values we will using np.arange instead of np.linspace because we care with the step size of grid rather than the number of points. In this case the x-axis (xAxis) ranged from -12.5 to 7.5 with step size is 0.01 and the y-axis (yAxis) ranged from -13 to 7 with step size is 0.01.

By using np.meshgrid, the xAxis and yAxis are converted into coordinates grid X and Y. The variable Z is the function value calculated from objective function, which will be using to 3D surface plotting.

Write Plotting:

fig = plt.figure()
ax = fig.add_subplot(projection="3d")
surface = ax.plot_surface(x, y, z, cmap="ocean", alpha=0.5)

For make the result of the plot easier to understand, we need to set the title and axis labels

Write Plotting label:

ax.set_title("3D Surface Plot of the Objective Function")
ax.set_xlabel("X-axis")
ax.set_ylabel("Y-axis")
ax.set_zlabel("Z-axis  (Function Value)")

In ax.set_title() is used for set the title of the plot, while ax.set_xlabel(), ax.set_ylabel() and ax.set_zlabel() are used to the X, Y and Z axes, respectively.

Plotting show:

plt.show()

By using plt.show() we can get the result of plotting, in other case we can save the plot as an image file in your local storage with use plt.savefig("plotview.png") before plt.show().

Running Adam Optimizer

In the previous paragraph we only show the plot of the objective function, the main goal with the objective function is to perform optimization with Adam to find global minima point.

Let's run Adam Optimizer into the objective function, in this file we will named adamscratch.py :

def obj(x, y):
    return x**2 + 2 * y**2 + 5 * x + 12 * y + 22

def der(x, y):
    return np.array([2 * x + 5, 4 * y + 12])

def adam(obj, der):
    n_iter = 10000
    alpha  = 0.01
    beta1 = 0.9
    beta2 = 0.999
    eps = 1e-8

    bounds = np.array([[-12.5, 7.5], [-13, 7.0]])

    # initial point
    x = np.array([-12.5, 7.0])
    scores = []
    trajectory = []

    # initialize adam
    m = np.zeros(bounds.shape[0])
    v = np.zeros(bounds.shape[0])

    # adam formula
    for t in range(n_iter):
        g = der(x[0], x[1])
        for i in range(x.shape[0]):
            m[i] =  beta1 * m[i] + (1.0 - beta1) * g[i]
            v[i] = beta2 * v[i] + (1.0 - beta2) * g[i] ** 2
            mhat = m[i] / (1.0 - beta1 ** (t + 1))
            vhat = v[i] / (1.0 - beta2 ** (t + 1))
            x[i] -= alpha * mhat / (np.sqrt(vhat) + eps)

        score = obj(x[0], x[1])
        scores.append(score)
        trajectory.append(x.copy())

    return x, scores, trajectory, bounds

best, scores, trajectory, bounds = adam(obj, der)

# plotting
x = np.arange(bounds[0, 0], bounds[0, 1], 0.01)
y = np.arange(bounds[1, 0], bounds[1, 1], 0.01)
X, Y = np.meshgrid(x, y)
Z = obj(X, Y)

# best values
bestScore = obj(best[0], best[1])
print("Best: ", best)
print("Best score: ", f"{bestScore:.2f}")

fig = plt.figure()
ax = fig.add_subplot(111, projection="3d")
ax.plot_surface(X, Y, Z, cmap="ocean", alpha=0.5)
ax.scatter(best[0], best[1], obj(best[0], best[1]), color="blue", label="best")
ax.plot(
    [point[0] for point in trajectory],
    [point[1] for point in trajectory],
    scores,
    color="green",
    label="Trajectory",
)

ax.set_title("3D Surface Plot of the Objective Function")
ax.set_xlabel("X-axis")
ax.set_ylabel("Y-axis")
ax.set_zlabel("Z-axis (Function Value)")
ax.legend()
plt.show()

Run the file adamscratch.py to see the results, which include both the plot view and the trajectory of the adam optimization process.

adamscratch.py

For result of optimization process are printed:

Best:  [-2.5 -3. ] 
Best score:  -2.25

Explanation adamscratch.py

Write Adam parameter:

def adam(obj, der):
    n_iter = 10000
    alpha  = 0.01
    beta1 = 0.9
    beta2 = 0.999
    eps = 1e-8

As we know in Adam have parameter which it will be used to tune the optimizer, so in Adam function def adam(obj, der) we will set the parameters:

n_iter is set to 10,000 for total number of iterations
alpha is set to 0.01 as the learning rate
beta1 is set to 0.9 as the first moment decay rate
beta2 is set to 0.999 as the second moment decay rate
eps is set to 10^-8 as the epsilon value for prevent division by zero

Write bounds:

bounds = np.array([[-12.5, 7.5], [-13, 7.0]])

The bounds variable is used to define the range of the X and Y axes, in this case the X-axis ranges from -12.5 to 7.5, and the Y-axis ranges from -13 to 7.

Write initial points, record values for scores and trajectory functions:

#initial point
x = np.array([-12.5, 7.0])
scores = []
trajectory = []

It will be used for initial starting point of the optimizer which it started in point X = -12.5 and Y = -13, but in some case you can use randomizer initial points.

The scores variable is used to store the value of the objective function for each iteration, while the trajectory variable is used for recording every point visited by optimizer during the optimization and it will be used for visualizing the path in the plot.

Write Adam initialize:

# initialize adam
m = np.zeros(bounds.shape[0])
v = np.zeros(bounds.shape[0])

Initializing m variable first moment vector and v variable second moment vector to 0.

Adam function:

for t in range(n_iter):

for t in range(n_iter):
   g = der(x[0], x[1])
   for i in range(x.shape[0]):
       m[i] =  beta1 * m[i] + (1.0 - beta1) * g[i]
       v[i] = beta2 * v[i] + (1.0 - beta2) * g[i] ** 2
       mhat = m[i] / (1.0 - beta1 ** (t + 1))
       vhat = v[i] / (1.0 - beta2 ** (t + 1))
       x[i] -= alpha * mhat / (np.sqrt(vhat) + eps)
   score = obj(x[0], x[1])
   scores.append(score)
   trajectory.append(x.copy())

return x, scores, trajectory, bounds

The Adam function updates parameters by using gradients and keeps track of moving averages for both the gradients and their squares. At each step, it corrects these averages, then updates the parameters with an adaptive learning rate.

During the process, it stores the objective values and the path of updates, so you can see how the solution moves toward the optimum and will called in best point function to print the values.

Write plotting:

# plotting
x = np.arange(bounds[0, 0], bounds[0, 1], 0.01)
y = np.arange(bounds[1, 0], bounds[1, 1], 0.01)
X, Y = np.meshgrid(x, y)
Z = obj(X, Y)

Write best point result:

# best values
bestScore = obj(best[0], best[1])
print("Best: ", best)
print("Best score: ", f"{bestScore:.2f}")

In the best values contained function used for calling the function best point and point result, and print the values.

Write Visualization

fig = plt.figure()
ax = fig.add_subplot(111, projection="3d")
ax.plot_surface(X, Y, Z, cmap="ocean", alpha=0.5)
ax.scatter(best[0], best[1], obj(best[0], best[1]), color="blue", label="best")
ax.plot(
    [point[0] for point in trajectory],
    [point[1] for point in trajectory],
    scores,
    color="green",
    label="Trajectory",
)

Write plotting label and show plot:

ax.set_title("3D Surface Plot of the Objective Function")
ax.set_xlabel("X-axis")
ax.set_ylabel("Y-axis")
ax.set_zlabel("Z-axis (Function Value)")
ax.legend()
plt.show()

Setup title and label for plotting result, it is used for more easier to view and make more prettier.

Note: If you're using a mobile or tablet, please switch to desktop mode for the best experience.

Source:

Adam: A method for stochastic optimization. arXiv:1412.6980. Diederik P. Kingma and Jimmy Lei Ba
Microsoft Visual Studio Code
Spyder Python IDE
https://machinelearningmastery.com/adam-optimization-from-scratch/
https://github.com/naufalalmajid/adam-optimizer