abstract:5eaaf36926b7d5c5.tex

1: \begin{abstract}

2: This paper presents a new class of gradient methods for distributed machine learning that adaptively skip the gradient calculations to learn with reduced communication and computation.

3: Simple rules are designed to detect slowly-varying gradients and, therefore, trigger the reuse of outdated gradients.

4: The resultant gradient-based algorithms are termed \textbf{L}azily \textbf{A}ggregated \textbf{G}radient --- justifying our acronym \textbf{LAG} used henceforth.

5: Theoretically, the merits of this contribution are: i) the convergence rate is the same as batch gradient descent in strongly-convex, convex, and nonconvex smooth cases;

6: and, ii) if the distributed datasets are heterogeneous (quantified by certain measurable constants), the communication rounds needed to achieve a targeted accuracy are reduced thanks to the adaptive reuse of \emph{lagged} gradients.

7: Numerical experiments on both synthetic and real data corroborate a significant communication reduction compared to alternatives.

8: \end{abstract}

9: