1: \begin{abstract}
2: We present \erasegd, a new approach for distributed gradient descent (GD) that mitigates system delays by employing approximate gradient coding.
3: Gradient coded distributed GD uses redundancy to {\it exactly} recover the gradient at each iteration from a subset of compute nodes.
4: \erasegd{} instead uses {\it approximate} gradient codes to recover an inexact gradient at each iteration, but with higher delay tolerance.
5: Unlike prior work on gradient coding, we provide a performance analysis that combines both delay and convergence guarantees.
6: We establish that down to a small noise floor, \erasegd{} converges as quickly as distributed GD and has faster overall runtime under a probabilistic delay model.
7: We conduct extensive experiments on real world datasets and distributed clusters and demonstrate that our method can lead to significant speedups over both standard and gradient coded GD.
8: \end{abstract}
9: