abstract:7f42118df6571bc0.tex

1: \begin{abstract}

2: 	First proposed by \citet{Seide2014} as a heuristic, error feedback (\algname{EF}) is a very popular mechanism for enforcing convergence of distributed gradient-based optimization methods enhanced with communication compression strategies based on the application of contractive compression operators. However, existing theory of \algname{EF} relies on very strong assumptions (e.g., bounded gradients), and provides pessimistic convergence rates (e.g., while the best known rate for \algname{EF} in the smooth nonconvex regime, and when full gradients are compressed, is $O(1/T^{2/3})$, the rate of gradient descent in the same regime is $O(1/T)$). Recently, \citet{EF21} (2021) proposed a new error feedback mechanism, \algname{EF21}, based on the construction of a Markov compressor induced by a contractive compressor. \algname{EF21} removes the aforementioned theoretical deficiencies of \algname{EF} and at the same time works better in practice. In this work we propose six practical extensions of \algname{EF21}, all supported by strong convergence theory: partial participation, stochastic approximation, variance reduction, proximal setting, momentum and bidirectional compression. Several of these techniques were never analyzed in conjunction with \algname{EF} before, and in cases where they were (e.g., bidirectional compression), our rates are vastly superior.

3: 		\end{abstract}

4: