# Batch Variants — Gradient Descent

## How to work with large data!

Introduction:

Gradient Descent serves as a fundamental approach for optimizing the parameters of a model to minimize a cost function. The basic idea is to iteratively update the model parameters in the opposite direction of the gradient of the cost function. But in a real-world scenarios with ever growing datasets, it has two major limitations:

• Calculating derivatives for the entire dataset is time consuming,
• Memory required is proportional to the size of the dataset.

These computations are typically done in memory. so, the larger the dataset, the greater the memory requirements.

This is where Stochastic Gradient Descent comes in!

SGD:

Stochastic Gradient Descent introduces a key modification to the traditional Gradient Descent algorithm by updating the parameters using only a small, randomly-selected subset of the training data at each iteration. This mini-batch approach significantly reduces the computational requirements, allowing for faster convergence and more efficient model training.

Following are the steps

Step 1: Randomly select the data set of size 1.

Step 2: Select a learning rate α

Step 3: Select initial parameter values x as the starting point

The below expression is minor modification to the Gradient Descent, where m = 1

Step 4: Update all parameters from the gradient of a single training example randomly xj, we can rewrite the above expression as

Step 5: Repeat Step 4 until a local minimum is reached.

python code with a simple function.

import required modules

`import numpy as npimport matplotlib.pyplot as plt`

define the function or loss function in case of machine learning

`# Function to simulate a simple convex loss surfacedef loss_function(x):    return 0.5 * (x - 3) ** 2`

now, let define the gradient or vector of partial derivatives

`# Gradient of the loss functiondef gradient(x):    return x - 3`

below is the code for Stochastic Gradient Descent algorithm, where simulates a random data point in the given range, then applies update rule.

`# Stochastic Gradient Descent algorithmdef stochastic_gradient_descent(learning_rate, epochs):    np.random.seed(42)    x = 0  # Initial parameter value    trajectory = [x]    for epoch in range(epochs):        random_data_point = np.random.uniform(low=-5, high=5)        x = x - learning_rate * gradient(random_data_point)        trajectory.append(x)    return trajectory`

initialize and generate the required data

`# Set parameterslearning_rate = 0.1epochs = 50# Run Stochastic Gradient Descenttrajectory = stochastic_gradient_descent(learning_rate, epochs)x_values = np.linspace(-5, 5, 100)y_values = loss_function(x_values)`

Plotting the loss surface and the trajectory of SGD

`plt.plot(x_values, y_values, label='Loss Surface')plt.scatter(    trajectory,     [loss_function(x) for x in trajectory],     color='red',     label='SGD Trajectory'    )plt.title('Stochastic Gradient Descent')plt.xlabel('Parameter Value')plt.ylabel('Loss')plt.legend()plt.show()`

3 dimensional plot for the above code is as shown below

Animated version of the SGD is as shown below

Mini-Batch:

While Batch Gradient Descent or Gradient Descent assures smooth convergence on convex functions and full dataset utilization, Stochastic Gradient Descent(SGD) shines in handling large datasets and faster convergence rates. However, both methods come with trade-offs that can impact computational efficiency and convergence speed.

To strike a balance between these two extremes, a hybrid approach known as Mini-Batch Gradient Descent is useful. This technique brings optimization by leveraging the advantages of both Batch Gradient Descent and Stochastic Gradient Descent while mitigating their drawbacks.

Mini-Batch Gradient Descent operates by dividing the dataset into smaller, equally sized subsets called mini-batches. These mini-batches provide enough data to perform stable parameter updates while remaining small enough to fit into memory efficiently.

Following are the steps

Step 1: Randomly shuffle the data set of size m (such as 16, 32, 64, 128 etc).

Step 2: Select a learning rate α

Step 3: Select initial parameter values x as the starting point

The below expression is minor modification to the Gradient Descent

Step 4: Update all parameters from the gradient of a single training example randomly xj, we can rewrite the above expression as

Step 5: Repeat Step 4 until a local minimum is reached.

python code with a simple function.

import required modules

`import numpy as npimport matplotlib.pyplot as plt`

define the function or loss function in case of machine learning

`# Function to simulate a simple convex loss surfacedef loss_function(x):    return x ** 2`

now, let define the gradient or vector of partial derivatives

`# Gradient of the loss functiondef gradient(x):    return 2 * x`

below is the code for Stochastic Gradient Descent algorithm, where simulates a random data point in the given range, then applies update rule.

`# Mini-Batch Gradient Descent algorithmdef mini_batch_gradient_descent(learning_rate, epochs, batch_size):    np.random.seed(42)    x = 0  # Initial parameter value    trajectory = [x]    for epoch in range(epochs):        random_data_points = np.random.uniform(low=-10, high=10, size=batch_size)        gradients = np.mean([gradient(x) for x in random_data_points])        x = x - learning_rate * gradients        trajectory.append(x)    return trajectory`

initialize and generate the required data

`# Set parameterslearning_rate = 0.1epochs = 50batch_size = 10# Run Mini-Batch Gradient Descenttrajectory = mini_batch_gradient_descent(learning_rate, epochs, batch_size)# Plotting the loss surface and the trajectory of Mini-Batch Gradient Descentx_values = np.linspace(-10, 10, 100)y_values = loss_function(x_values)`

Plotting the loss surface and the trajectory of SGD

`plt.plot(x_values, y_values, label='Loss Surface')plt.scatter(    trajectory,    [loss_function(x) for x in trajectory],    color='red',    label='Mini-Batch Gradient Descent Trajectory')plt.title('Mini-Batch Gradient Descent')plt.xlabel('Parameter Value')plt.ylabel('Loss')plt.legend()plt.show()`

Comparisons between Batch GD, SGD, Mini-batch GD

references: Wikipedia, IIT-M AI/ML course, open source materials.