abstract:10da30b24fc25f66.tex

1: \begin{abstract}

2: 	We present \erasegd, a new approach for distributed gradient descent (GD) that mitigates system delays by employing approximate gradient coding.

3: 	Gradient coded distributed GD uses redundancy to {\it exactly} recover the gradient at each iteration from a subset of compute nodes.

4: 	\erasegd{} instead uses {\it approximate} gradient codes to recover an inexact gradient at each iteration, but with higher delay tolerance.

5: 	Unlike prior work on gradient coding, we provide a performance analysis that combines both delay and convergence guarantees.

6: 	We establish that down to a small noise floor, \erasegd{} converges as quickly as distributed GD and has faster overall runtime under a probabilistic delay model.

7: 	We conduct extensive experiments on real world datasets and distributed clusters and demonstrate that our method can lead to significant speedups over both standard and gradient coded GD.

8: 	\end{abstract}

9: