1: \begin{abstract}
2: In this paper, we study the performance of a large family of SGD variants in the smooth nonconvex regime.
3: To this end, we propose a generic and flexible assumption capable of accurate modeling of the second moment of the stochastic gradient.
4: Our assumption is satisfied by a large number of specific variants of SGD in the literature, including SGD with arbitrary sampling, SGD with compressed gradients, and a wide variety of variance-reduced SGD methods such as SVRG and SAGA.
5: We provide a single convergence analysis for all methods that satisfy the proposed unified assumption, thereby offering a unified understanding of SGD variants in the nonconvex regime instead of relying on dedicated analyses of each variant.
6: Moreover, our unified analysis is accurate enough to recover or improve upon the best-known convergence results of several classical methods, and also gives new convergence results for many new methods which arise as special cases.
7: In the more general distributed/federated nonconvex optimization setup, we propose two new general algorithmic frameworks differing in whether direct gradient compression (DC) or compression of gradient differences (DIANA) is used.
8: We show that all methods captured by these two frameworks also satisfy our unified assumption.
9: Thus, our unified convergence analysis also captures a large variety of distributed methods utilizing compressed communication.
10: Finally, we also provide a unified analysis for obtaining faster linear convergence rates in this nonconvex regime under the PL condition.
11: \end{abstract}
12: