35388f747aed80c8.tex
1: \begin{abstract}
2: In this paper, we study the performance of a large family of SGD variants in the smooth nonconvex regime. 
3: To this end, we propose a generic and flexible assumption capable of accurate modeling of the second moment of the stochastic gradient. 
4: Our assumption is satisfied by a large number of specific variants of SGD in the literature, including SGD with arbitrary sampling, SGD with compressed gradients, and a wide variety of variance-reduced SGD methods such as SVRG and SAGA. 
5: We provide a single convergence analysis for all methods that satisfy the proposed unified assumption, thereby offering a unified understanding of SGD variants in the nonconvex regime instead of relying on dedicated analyses of each variant. 
6: Moreover, our unified analysis is accurate enough to recover or improve upon the best-known convergence results of several classical methods, and also gives new convergence results for many new methods which arise as special cases. 
7: In the more general distributed/federated nonconvex optimization setup, we propose two new general algorithmic frameworks differing in whether direct gradient compression (DC) or compression of gradient differences (DIANA) is used. 
8: We show that all methods captured by these two frameworks also satisfy our unified assumption. 
9: Thus, our unified convergence analysis also captures a large variety of distributed methods utilizing compressed communication. 
10: Finally, we also provide a unified analysis for obtaining faster linear convergence rates in this nonconvex regime under the PL condition.
11: \end{abstract}
12: