1: \begin{abstract}
2: This paper presents a new class of gradient methods for distributed machine learning that adaptively skip the gradient calculations to learn with reduced communication and computation.
3: Simple rules are designed to detect slowly-varying gradients and, therefore, trigger the reuse of outdated gradients.
4: The resultant gradient-based algorithms are termed \textbf{L}azily \textbf{A}ggregated \textbf{G}radient --- justifying our acronym \textbf{LAG} used henceforth.
5: Theoretically, the merits of this contribution are: i) the convergence rate is the same as batch gradient descent in strongly-convex, convex, and nonconvex smooth cases;
6: and, ii) if the distributed datasets are heterogeneous (quantified by certain measurable constants), the communication rounds needed to achieve a targeted accuracy are reduced thanks to the adaptive reuse of \emph{lagged} gradients.
7: Numerical experiments on both synthetic and real data corroborate a significant communication reduction compared to alternatives.
8: \end{abstract}
9: