92f03984306fb8e2.tex
1: \begin{abstract}
2: 	We provide tight finite-time convergence bounds for gradient descent and 
3: 	stochastic gradient descent on quadratic functions, when the gradients are 
4: 	delayed and reflect iterates from $\tau$ rounds ago. First, we show that 
5: 	without stochastic noise, delays strongly affect the attainable 
6: 	optimization error: In fact, the error can be as bad as non-delayed 
7: 	gradient descent ran on only $1/\tau$ of the gradients. In sharp contrast, 
8: 	we quantify how stochastic noise makes the effect of delays negligible, 
9: 	improving on previous work which only showed this phenomenon asymptotically 
10: 	or for much smaller delays. Also, in the context 
11: 	of distributed optimization, the results indicate that the performance of 
12: 	gradient descent with delays is competitive with synchronous approaches 
13: 	such as mini-batching. Our results are based on a novel technique for 
14: 	analyzing convergence of optimization algorithms using generating functions.
15: \end{abstract}