c5eced6d087bf053.tex
1: \begin{abstract}%   <- trailing '%' for backward compatibility of .sty file
2: We propose and analyze several stochastic gradient algorithms for 
3: finding stationary points or local minimum in nonconvex, possibly with nonsmooth regularizer, finite-sum and online optimization problems.
4: First, we propose a simple proximal stochastic gradient algorithm based on variance reduction called ProxSVRG+.  
5: We provide a clean and tight analysis of ProxSVRG+, which shows that it outperforms the deterministic proximal gradient descent (ProxGD) for a wide range of minibatch sizes, hence solves an open problem proposed in~\citet{reddi2016proximal}. Also, ProxSVRG+ uses much less proximal oracle calls than ProxSVRG~\citep{reddi2016proximal} and extends to the online setting by avoiding full gradient computations.
6: Then, we further propose an optimal algorithm, called SSRGD, based on SARAH~\citep{nguyen2017sarah} and show that
7: SSRGD further improves the gradient complexity of ProxSVRG+ and achieves the optimal upper bound, matching the known lower bound of \citep{fang2018spider,li2021page}.
8: Moreover, we show that both ProxSVRG+ and SSRGD enjoy automatic adaptation with local structure of the objective function such as the Polyak-\L{}ojasiewicz (PL) condition for nonconvex functions in the finite-sum case, i.e., we prove that both of them can automatically switch to faster global linear convergence without any restart performed in prior work ProxSVRG~\citep{reddi2016proximal}.
9: Finally, we focus on the more challenging problem of finding an $(\epsilon, \delta)$-local minimum
10: instead of just finding an $\epsilon$-approximate (first-order) stationary point 
11: (which may be some bad unstable saddle points).
12: We show that SSRGD can find an $(\epsilon, \delta)$-local minimum 
13: by simply adding some random perturbations. 
14: Our algorithm is almost as simple as its counterpart for finding stationary points, and achieves similar
15: optimal rates.
16: \end{abstract}
17: