abstract:35388f747aed80c8.tex

1: \begin{abstract}

2: In this paper, we study the performance of a large family of SGD variants in the smooth nonconvex regime.

3: To this end, we propose a generic and flexible assumption capable of accurate modeling of the second moment of the stochastic gradient.

4: Our assumption is satisfied by a large number of specific variants of SGD in the literature, including SGD with arbitrary sampling, SGD with compressed gradients, and a wide variety of variance-reduced SGD methods such as SVRG and SAGA.

5: We provide a single convergence analysis for all methods that satisfy the proposed unified assumption, thereby offering a unified understanding of SGD variants in the nonconvex regime instead of relying on dedicated analyses of each variant.

6: Moreover, our unified analysis is accurate enough to recover or improve upon the best-known convergence results of several classical methods, and also gives new convergence results for many new methods which arise as special cases.

7: In the more general distributed/federated nonconvex optimization setup, we propose two new general algorithmic frameworks differing in whether direct gradient compression (DC) or compression of gradient differences (DIANA) is used.

8: We show that all methods captured by these two frameworks also satisfy our unified assumption.

9: Thus, our unified convergence analysis also captures a large variety of distributed methods utilizing compressed communication.

10: Finally, we also provide a unified analysis for obtaining faster linear convergence rates in this nonconvex regime under the PL condition.

11: \end{abstract}

12: