abstract:8679e59ad829dd88.tex

1: \begin{abstract}

2:   We analyze the convergence of gradient-based optimization algorithms

3:   that base their updates on delayed stochastic gradient

4:   information. The main application of our results is to the

5:   development of gradient-based distributed optimization algorithms

6:   where a master node performs parameter updates while worker nodes

7:   compute stochastic gradients based on local information in parallel,

8:   which may give rise to delays due to asynchrony. We take motivation

9:   from statistical problems where the size of the data is so large

10:   that it cannot fit on one computer; with the advent of huge datasets

11:   in biology, astronomy, and the internet, such problems are now

12:   common. Our main contribution is to show that for smooth stochastic

13:   problems, the delays are asymptotically negligible and we can

14:   achieve order-optimal convergence results. In application to

15:   distributed optimization, we develop procedures that overcome

16:   communication bottlenecks and synchronization requirements. We show

17:   $n$-node architectures whose optimization error in stochastic

18:   problems---in spite of asynchronous delays---scales asymptotically

19:   as $\order(1 / \sqrt{nT})$ after $T$ iterations. This rate is known

20:   to be optimal for a distributed system with $n$ nodes even in the

21:   absence of delays. We additionally complement our theoretical

22:   results with numerical experiments on a statistical machine learning

23:   task.

24: \end{abstract}

25: