8b9a14f509d171a6.tex
1: \begin{abstract} 
2: The present paper develops a novel aggregated gradient approach for distributed machine learning that adaptively compresses the gradient communication. The key idea is to first \emph{quantize} the computed gradients, and then \emph{skip} less informative quantized gradient communications by reusing outdated gradients. Quantizing and skipping result in 
3: `lazy' worker-server communications, which justifies the term \textbf{L}azily \textbf{A}ggregated \textbf{Q}uantized gradient that is henceforth abbreviated as  \textbf{LAQ}. Our LAQ can provably attain the same linear convergence rate as the gradient descent in the strongly convex case, while effecting major savings in the  communication overhead both in transmitted \emph{bits} as well as in communication \emph{rounds}. Empirically, experiments with real data corroborate a significant communication reduction compared to existing gradient- and stochastic gradient-based algorithms. \let\thefootnote\relax\footnotetext{$^\dag$ Jun Sun and Tianyi Chen contributed equally to this work.}
4: \end{abstract}
5: