abstract:5528f12180562187.tex

1: \begin{abstract}

2: Understanding the convergence performance of asynchronous stochastic gradient descent method (Async-SGD) has received increasing attention in recent years due to their foundational role in machine learning.

3: To date, however, most of the existing works are restricted to either  bounded gradient delays or convex settings.

4: %In this paper, we focus on Async-SGD for non-convex optimization problems with unbounded gradient delays.

5: In this paper, we focus on Async-SGD and its variant Async-SGDI (which uses increasing batch size) for non-convex optimization problems with unbounded gradient delays.

6: %We analyze the convergence performance of standard Async-SGD and an Async-SGD variant with increasing batch size (Async-SGDI) aiming for variance reduction.

7: %We prove asymptotic $o(1/\sqrt{k})$ convergence rate for the standard Async-SGD algorithm and $o(1/k)$ for Async-SGDI.

8: We prove $o(1/\sqrt{k})$ convergence rate for Async-SGD and $o(1/k)$ for Async-SGDI.

9: %Also, we develop a unifying sufficient condition for Async-SGD's convergence that includes two major gradient update delay models in the literature as special cases.

10: Also, a unifying sufficient condition for Async-SGD's convergence is established, which includes two major gradient  delay models in the literature as special cases and yields a new delay model not considered thus far.

11: \end{abstract}

12: