abstract:92f03984306fb8e2.tex

1: \begin{abstract}

2: 	We provide tight finite-time convergence bounds for gradient descent and

3: 	stochastic gradient descent on quadratic functions, when the gradients are

4: 	delayed and reflect iterates from $\tau$ rounds ago. First, we show that

5: 	without stochastic noise, delays strongly affect the attainable

6: 	optimization error: In fact, the error can be as bad as non-delayed

7: 	gradient descent ran on only $1/\tau$ of the gradients. In sharp contrast,

8: 	we quantify how stochastic noise makes the effect of delays negligible,

9: 	improving on previous work which only showed this phenomenon asymptotically

10: 	or for much smaller delays. Also, in the context

11: 	of distributed optimization, the results indicate that the performance of

12: 	gradient descent with delays is competitive with synchronous approaches

13: 	such as mini-batching. Our results are based on a novel technique for

14: 	analyzing convergence of optimization algorithms using generating functions.

15: \end{abstract}