Optimization algorithm used to minimize a loss function by iteratively updating model parameters.
#How it works
- Linear model that uses stochastic gradient descent to optimize a regression or classification Loss Function (e.g. MSE, Huber loss, epsilon-insensitive loss)
- Minimize a loss function by iteratively updating the model Parameters in the direction of the negative gradient of the loss function (Gradient Descent)
- Use a subset of data for each update, introducing randomness, which helps escape local minima and find better solutions in non-convex optimization problems.
- Parameters ($\theta$) updating rule: $\theta = \theta - \eta \cdot \nabla L(\theta)$.
#Preconditions
- Learning rate Hyperparameter tuning
- Feature Scaling
#Evaluation
#Advantages
- Scalable
- Supports various classification paradigms
- Supports high-dimensional and sparse data