What is the difference between SGD and GD?

816 Asked by ColemanGarvin in Data Science , Asked on Jul 2, 2024

I am currently engaged in a particular task that is related to training a deep-learning model on a dataset with millions of samples. What are the advantages and also the disadvantages of using stochastic gradient descent versus traditional gradient descent in terms of convergence speed and computational Effie for this particular scenario?

Answered by David WHITE

In the context of data science, here are the differences Given:

Gradient descent (GD)

Gradient descent can compute the gradient of the cost function to the parameters for the entire dataset. It can update the parameters in the opposite direction of the gradient to minimize of the cost function. It can generally converge to the global minimum or a local minimum with fewer iterations as compared to the SGD. However, it needs the computational of the gradient from the entire dataset before each parameters update which can be computationally expensive for large datasets. There is also a disadvantage of GD that storing the entire dataset in memory is necessary which might not be feasible for the very large datasets.

# Gradient Descent

Def gradient_descent(X, y, theta, alpha, num_iters):

    M = len(y)

    J_history = []

    For iter in range(num_iters):

        Error = np.dot(X, theta) – y

        Gradient = np.dot(X.T, error) / m

        Theta -= alpha * gradient

        J_history.append(cost_function(X, y, theta))

    Return theta, J_history

Stochastic gradient descent (SGD)

The stochastic gradient descent can compute the gradient and then update the parameters for each training example in the dataset. It can be done one at a time or even in small batches. It can also introduce randomness in the parameter updates. The SGD can update the parameters more frequently and it can use only a small subset of the data for each update which can make it computationally efficient especially for the dataset which are large. However, the SGD has a nature that can lead to noisy updates and it possibly does not cover the global minimum instead oscillates around it, especially in the non-convex optimization problems. It also needs careful tuning of the learning rate and schedule to ensure convergence.

Here is the example code for the stochastic gradient descent:-

# Stochastic Gradient Descent

Def stochastic_gradient_descent(X, y, theta, alpha, num_iters):

    M = len(y)

    J_history = []

    For iter in range(num_iters):

        For I in range(m):

            Random_index = np.random.randint(m)

            Xi = X[random_index:random_index+1]

            Yi = y[random_index:random_index+1]

            Error = np.dot(xi, theta) – yi

            Gradient = xi.T * error

            Theta -= alpha * gradient

        J_history.append(cost_function(X, y, theta))

    Return theta, J_history

Your Answer

Answers (6)

RowanButler

Stochastic Gradient Descent (SGD) is generally more efficient than traditional gradient descent for training deep learning models on datasets with millions of samples. It updates model weights more frequently, allowing for a little to the left faster initial learning and less memory usage. However, SGD introduces more noise in the training process, which can make convergence slower and less stable. Traditional gradient descent, while more stable, is computationally expensive and slow on large datasets. Therefore, mini-batch SGD with optimizers like Adam is often the best choice for such large-scale training tasks.

4 Weeks

AaronMarquardt

Great explanation of GD vs SGD! For massive datasets, SGD's faster updates are key. Think of GD as carefully planning each step, while SGD is more like quickly navigating a terrain, perhaps with a few stumbles. The randomness is akin to unexpected slopes in the Slope Game - challenging but potentially leading to faster progress. Understanding these trade-offs is crucial for effective model training.

3 Months

Lily

Gone are the days of printing out paper maps or relying on clunky GPS devices. With mapquest directions, all the information you need is right at your fingertips.

7 Months

Asher

Thanks for sharing, I agree with this point of view! SGD is sprunki a great choice when working with huge datasets.

7 Months

PedroAhmed

We amazed using the evaluation a person designed to get this to specific submit amazing. Fantastic exercise!

guest posts service

9 Months

Amora

This is an excellent explanation so that anyone can understand slope with so much granular level of detail. But you did not speak much about when to use SGD although you clarified better on GD and Mini Batch SGD.

11 Months