Stochastic Gradient Descent - Yousef's Notes
Stochastic Gradient Descent

Stochastic Gradient Descent

Optimization algorithm used to minimize a loss function by iteratively updating model parameters.

#How it works

  • Linear model that uses stochastic gradient descent to optimize a regression or classification Loss Function (e.g. MSE, Huber loss, epsilon-insensitive loss)
  • Minimize a loss function by iteratively updating the model Parameters in the direction of the negative gradient of the loss function (Gradient Descent)
  • Use a subset of data for each update, introducing randomness, which helps escape local minima and find better solutions in non-convex optimization problems.
  • Parameters ($\theta$) updating rule: $\theta = \theta - \eta \cdot \nabla L(\theta)$.

#Preconditions

#Evaluation

#Advantages

  • Scalable
  • Supports various classification paradigms
  • Supports high-dimensional and sparse data

#Limitations