10da30b24fc25f66.tex
1: \begin{abstract}
2: 	We present \erasegd, a new approach for distributed gradient descent (GD) that mitigates system delays by employing approximate gradient coding.
3: 	Gradient coded distributed GD uses redundancy to {\it exactly} recover the gradient at each iteration from a subset of compute nodes. 
4: 	\erasegd{} instead uses {\it approximate} gradient codes to recover an inexact gradient at each iteration, but with higher delay tolerance.
5: 	Unlike prior work on gradient coding, we provide a performance analysis that combines both delay and convergence guarantees.
6: 	We establish that down to a small noise floor, \erasegd{} converges as quickly as distributed GD and has faster overall runtime under a probabilistic delay model.
7: 	We conduct extensive experiments on real world datasets and distributed clusters and demonstrate that our method can lead to significant speedups over both standard and gradient coded GD.
8: 	\end{abstract}
9: