
- They are used for non differentiable function
- Unlike gradient it is not a descent and function value can often increase as well
- To combat this we keep the values of best point found so far

- Also step size needs to be predefined
- Line search option of gradient descent does not work here
One application is in Lasso.

References
Click to access subgrad_method.pdf

