1: \begin{abstract}
2:
3: In this paper, we propose and analyze SPARQ-SGD, which is an
4: event-triggered and compressed algorithm for decentralized training of
5: large-scale machine learning models over a graph. Each node can locally compute a
6: condition (event) which triggers a communication where quantized
7: and sparsified local model parameters are sent. In SPARQ-SGD each node
8: takes at least a fixed number ($H$) of local gradient steps and then
9: checks if the model parameters have significantly changed compared to
10: its last update; it communicates further compressed model parameters
11: only when there is a significant change, as specified by a (design)
12: criterion. We prove that the SPARQ-SGD converges as $O(\frac{1}{nT})$
13: and $O(\frac{1}{\sqrt{nT}})$ in the strongly-convex and non-convex
14: settings, respectively, demonstrating that such aggressive compression,
15: including event-triggered communication, model sparsification and
16: quantization does not affect the overall convergence rate as compared
17: to uncompressed decentralized training; % \cite{lian2017can};
18: thereby theoretically yielding communication efficiency for ``free''. We
19: evaluate SPARQ-SGD over real datasets to demonstrate significant
20: amount of savings in communication over the state-of-the-art. %, the CHOCO-SGD algorithm from \cite{koloskova_decentralized_2019-1,koloskova_decentralized_2019}.
21: %\TODO{write the savings we get in convex and non-convex experiments.}
22: %To get the same performance, we communicate $10\times$ less number of bits both for a convex objective on the MNIST dataset
23: %and for a non-convex objective on the CIFAR-10 dataset.
24: \end{abstract}
25: