abstract:6dddac27a997a64c.tex

1: \begin{abstract}

2: We analyze stochastic gradient algorithms for optimizing nonconvex, nonsmooth finite-sum problems. In particular, the objective function is given by the summation of a differentiable (possibly nonconvex) component, together with a possibly non-differentiable but convex component.

3: We propose a proximal stochastic gradient algorithm based on variance reduction, called ProxSVRG+.

4: Our main contribution lies in the analysis of ProxSVRG+.

5: It recovers several existing convergence results and improves/generalizes them (in terms of the number of stochastic gradient oracle calls and proximal oracle calls).

6: In particular, ProxSVRG+ generalizes the best results given by the SCSG algorithm, recently proposed by \citep{lei2017non} for the smooth nonconvex case.

7: ProxSVRG+ is also more straightforward than SCSG and yields simpler analysis.

8: Moreover, ProxSVRG+ outperforms the deterministic proximal gradient descent (ProxGD) for a wide range of minibatch sizes, which partially solves an open problem proposed in \citep{reddi2016proximal}.

9: Also, ProxSVRG+ uses much less proximal oracle calls than ProxSVRG \citep{reddi2016proximal}.

10: Moreover, for nonconvex functions satisfied Polyak-\L{}ojasiewicz condition, we prove that ProxSVRG+ achieves a global linear convergence rate without restart unlike ProxSVRG.

11: Thus, it can \emph{automatically} switch to the faster linear convergence in some regions as long as the objective function satisfies the PL condition locally in these regions.

12: ProxSVRG+ also improves ProxGD and ProxSVRG/SAGA, and generalizes the results of SCSG in this case.

13: Finally, we conduct several experiments and the experimental results are consistent with the theoretical results.

14: \end{abstract}

15: