abstract:d4b2eb616074f9bd.tex

1: \begin{abstract}

2: We analyze stochastic gradient algorithms for optimizing nonconvex problems.

3: In particular, our goal is to find local minima (second-order stationary points) instead of just finding first-order stationary points which may be some bad unstable saddle points.

4: We show that a simple perturbed version of stochastic recursive gradient descent algorithm (called SSRGD) can find an $(\epsilon,\delta)$-second-order stationary point with $\widetilde{O}(\sqrt{n}/\epsilon^2 + \sqrt{n}/\delta^4 + n/\delta^3)$ stochastic gradient complexity for nonconvex finite-sum problems.

5: As a by-product, SSRGD finds an $\epsilon$-first-order stationary point with $O(n+\sqrt{n}/\epsilon^2)$ stochastic gradients. These results are almost optimal since \cite{fang2018spider} provided a lower bound $\Omega(\sqrt{n}/\epsilon^2)$ for finding even just an $\epsilon$-first-order stationary point.

6: We emphasize that SSRGD algorithm for finding second-order stationary points is as simple as for

7: finding first-order stationary points just by adding a uniform perturbation sometimes, while all other algorithms for finding second-order stationary points with similar gradient complexity need to combine with a negative-curvature search subroutine (e.g., Neon2 \citep{allen2018neon2}).

8: Moreover, the simple SSRGD algorithm gets a simpler analysis.

9: Besides, we also extend our results from nonconvex finite-sum problems to nonconvex online (expectation) problems, and prove the corresponding convergence results.

10: \end{abstract}