1: \begin{abstract}
2: We provide tight finite-time convergence bounds for gradient descent and
3: stochastic gradient descent on quadratic functions, when the gradients are
4: delayed and reflect iterates from $\tau$ rounds ago. First, we show that
5: without stochastic noise, delays strongly affect the attainable
6: optimization error: In fact, the error can be as bad as non-delayed
7: gradient descent ran on only $1/\tau$ of the gradients. In sharp contrast,
8: we quantify how stochastic noise makes the effect of delays negligible,
9: improving on previous work which only showed this phenomenon asymptotically
10: or for much smaller delays. Also, in the context
11: of distributed optimization, the results indicate that the performance of
12: gradient descent with delays is competitive with synchronous approaches
13: such as mini-batching. Our results are based on a novel technique for
14: analyzing convergence of optimization algorithms using generating functions.
15: \end{abstract}