1: \begin{abstract}
2: %This document provides a basic paper template and submission guidelines.
3: %Abstracts must be a single paragraph, ideally between 4--6 sentences long.
4: %Gross violations will trigger corrections at the camera-ready phase.
5: The convergence speed of machine learning models trained with Federated
6: Learning is significantly affected by heterogeneous data partitions, even more
7: so in a fully decentralized setting without a central server. In this paper, we show that the impact of
8: label distribution skew, an important type of data heterogeneity, can be
9: significantly reduced by carefully designing
10: the underlying communication topology. We present D-Cliques, a novel topology
11: that reduces gradient bias by grouping nodes in sparsely interconnected
12: cliques such that the label distribution in a clique is representative
13: of the global label distribution. We also show how to adapt the updates of
14: decentralized SGD
15: to obtain unbiased gradients and implement an effective momentum with
16: D-Cliques. Our extensive empirical evaluation on MNIST and CIFAR10 demonstrates that our approach
17: provides similar convergence speed as a fully-connected topology, which provides the best convergence
18: in a data heterogeneous setting, with a
19: significant reduction in the number of edges and messages. In a 1000-node
20: topology, D-Cliques require 98\% less edges and 96\% less total messages,
21: with further possible gains using a small-world topology across cliques.
22: \end{abstract}
23: