abstract:568bc21fe7cca448.tex

1: \begin{abstract}

2:

3: In this paper, we propose and analyze SPARQ-SGD, which is an

4: event-triggered and compressed algorithm for decentralized training of

5: large-scale machine learning models over a graph. Each node can locally compute a

6: condition (event) which triggers a communication where quantized

7: and sparsified local model parameters are sent. In SPARQ-SGD each node

8: takes at least a fixed number ($H$) of local gradient steps and then

9: checks if the model parameters have significantly changed compared to

10: its last update; it communicates further compressed model parameters

11: only when there is a significant change, as specified by a (design)

12: criterion. We prove that the SPARQ-SGD converges as $O(\frac{1}{nT})$

13: and $O(\frac{1}{\sqrt{nT}})$ in the strongly-convex and non-convex

14: settings, respectively, demonstrating that such aggressive compression,

15: including event-triggered communication, model sparsification and

16: quantization does not affect the overall convergence rate as compared

17: to uncompressed decentralized training; % \cite{lian2017can};

18: thereby theoretically yielding communication efficiency for ``free''. We

19: evaluate SPARQ-SGD over real datasets to demonstrate significant

20: amount of savings in communication over the state-of-the-art. %, the CHOCO-SGD algorithm from \cite{koloskova_decentralized_2019-1,koloskova_decentralized_2019}.

21: %\TODO{write the savings we get in convex and non-convex experiments.}

22: %To get the same performance, we communicate $10\times$ less number of bits both for a convex objective on the MNIST dataset

23: %and for a non-convex objective on the CIFAR-10 dataset.

24: \end{abstract}

25: