abstract:d84c2d597a9e90e3.tex

1: \begin{abstract}

2: In this paper, we propose a new technique named \textit{Stochastic Path-Integrated Differential EstimatoR} (\SPIDER), which can be used to track many deterministic quantities of interest with significantly reduced computational cost.

3: We apply \SPIDER\ to two tasks, namely the stochastic first-order and zeroth-order methods.

4: %

5: For stochastic first-order method, combining \SPIDER\ with normalized gradient descent, we propose two new algorithms, namely \SPIDER-SFO and \SPIDER-SFO\textsuperscript{+}, that solve non-convex stochastic optimization problems using stochastic gradients only.

6: We provide sharp error-bound results on their convergence rates.

7: In special, we prove that the \SPIDER-SFO and \SPIDER-SFO\textsuperscript{+} algorithms achieve a  record-breaking gradient computation cost of $\mathcal{O}\left(  \min( n^{1/2} \epsilon^{-2}, \epsilon^{-3} ) \right)$ for finding an $\epsilon$-approximate first-order and  $\tilde{\mathcal{O}}\left(  \min( n^{1/2} \epsilon^{-2}+\epsilon^{-2.5}, \epsilon^{-3} ) \right)$  for  finding an $(\epsilon, \mathcal{O}(\ep^{0.5}))$-approximate second-order stationary point, respectively.

8: In addition, we prove that \SPIDER-SFO nearly matches the algorithmic lower bound for finding approximate first-order stationary points under the gradient Lipschitz assumption in the finite-sum setting.

9: %

10: For stochastic zeroth-order method, we prove a cost of $\mathcal{O}( d \min( n^{1/2} \epsilon^{-2}, \epsilon^{-3}) )$ which outperforms all existing results.

11: \end{abstract}

12: