abstract:a803e2d46cee7277.tex

1: \begin{abstract}

2: %This document provides a basic paper template and submission guidelines.

3: %Abstracts must be a single paragraph, ideally between 4--6 sentences long.

4: %Gross violations will trigger corrections at the camera-ready phase.

5: The convergence speed of machine learning models trained with Federated

6: Learning is significantly affected by heterogeneous data partitions, even more

7: so in a fully decentralized setting without a central server. In this paper, we show that the impact of

8: label distribution skew, an important type of data heterogeneity, can be

9: significantly reduced by carefully designing

10: the underlying communication topology. We present D-Cliques, a novel topology

11: that reduces gradient bias by grouping nodes in sparsely interconnected

12: cliques such that the label distribution in a clique is representative

13: of the global label distribution. We also show how to adapt the updates of

14: decentralized SGD

15: to obtain unbiased gradients and implement an effective momentum with

16: D-Cliques. Our extensive empirical evaluation on MNIST and CIFAR10 demonstrates that our approach

17: provides similar convergence speed as a fully-connected topology, which provides the best convergence

18:  in a data heterogeneous setting, with a

19: significant reduction in the number of edges and messages. In a 1000-node

20: topology, D-Cliques require 98\% less edges and 96\% less total messages,

21: with further possible gains using a small-world topology across cliques.

22: \end{abstract}

23: