dc3a9698051493d5.tex
1: \begin{abstract}
2: We develop and analyze \algname{MARINA}: a new communication efficient method for non-convex distributed learning over heterogeneous datasets.   \algname{MARINA} employs a novel communication compression strategy based on the compression of gradient differences that is reminiscent of but different from the strategy employed in the \algname{DIANA} method of Mishchenko et al. (2019). Unlike  virtually all competing distributed first-order methods,  including \algname{DIANA}, ours is based on a carefully designed {\em biased} gradient estimator, which is the key to its superior theoretical and practical performance. The communication complexity bounds we prove for \algname{MARINA} are evidently better than those of all previous first-order methods. Further, we develop and analyze two variants of \algname{MARINA}: \algname{VR-MARINA} and \algname{PP-MARINA}. The first method  is designed for the case when the local loss functions owned by clients are either of a finite sum or of an expectation form, and the second method allows for a partial participation of clients -- a feature important in federated learning. All our methods are superior to previous state-of-the-art methods in terms of oracle/communication complexity. Finally, we provide a convergence analysis of all methods  for problems satisfying the Polyak-{\L}ojasiewicz condition.
3: \end{abstract}
4: