abstract:576c6a194e54e160.tex

1: \begin{abstract}

2:

3: We study the iteration complexity of stochastic gradient descent (SGD) for minimizing the gradient norm of smooth, possibly nonconvex functions. We provide several results, implying that the $\mathcal{O}(\epsilon^{-4})$ upper bound of Ghadimi and Lan~\cite{ghadimi2013stochastic} (for making the average gradient norm less than $\epsilon$) cannot be improved upon, unless a combination of additional assumptions is made. Notably, this holds even if we limit ourselves to convex quadratic functions. We also show that for nonconvex functions, the feasibility of minimizing gradients with SGD is surprisingly sensitive to the choice of optimality criteria.

4:

5:    % We study the iteration complexity of stochastic gradient descent (SGD) for minimizing the gradient norm of smooth functions.

6: %    Recently, it was shown that the classical, dimension-free $\Ocal(\epsilon^{-4})$ complexity bound for SGD (for making the average gradient norm less than $\epsilon$) can be improved upon, but only using more complicated algorithms or \note{and?} under additional assumptions.

7:     % This leads to the question of whether plain SGD, and without special assumptions, can achieve a bound better than $\Ocal(\epsilon^{-4})$. In this paper, we provide several lower bounds, which indicate that this conjecture is unlikely to hold.

8: %    This leads to the question of whether the improved rate of convergence can be attained using only a subset of the additional assumptions and algorithmic improvements. In this paper, we address this question by establishing an $\Omega(\epsilon^{-4})$ lower-complexity bound on SGD and showing that it continues to hold under some extensions to the method and under various assumptions.

9:  %   Along the way, we establish a new, simple and nearly tight lower complexity bound for general first-order methods in the deterministic setting.

10: \end{abstract}

11: