abstract:a9e839965931e3e9.tex

1: \begin{abstract}

2: Distributed optimization is widely deployed in practice to solve a

3: broad range of problems. In a typical asynchronous scheme, workers

4: calculate gradients with respect to out-of-date optimization parameters

5: while the master uses stale (i.e., delayed) gradients to update the

6: parameters. While using stale gradients can slow the convergence,

7: asynchronous methods speed up the overall optimization with respect

8: to wall clock time by allowing more frequent updates and reducing

9: idling times. In this paper, we present a variable per-epoch minibatch

10: scheme called Anytime Minibatch with Delayed Gradients (AMB-DG). In

11: AMB-DG, workers compute gradients in epochs of a fixed time while

12: the master uses stale gradients to update the optimization parameters.

13: We analyze AMB-DG in terms of its regret bound and convergence rate.

14: We prove that for convex smooth objective functions, AMB-DG achieves

15: the optimal regret bound and convergence rate. We compare the performance

16: of AMB-DG with that of Anytime Minibatch (AMB) which is similar to

17: AMB-DG but does not use stale gradients. In AMB, workers stay idle

18: after each gradient transmission to the master until they receive

19: the updated parameters from the master while in AMB-DG workers never

20: idle. We also extend AMB-DG to the fully distributed setting. We compare

21: AMB-DG with AMB when the communication delay is long and observe that

22: AMB-DG converges faster than AMB in wall clock time. We also compare

23: the performance of AMB-DG with the state-of-the-art fixed minibatch

24: approach that uses delayed gradients. We run our experiments on a

25: real distributed system and observe that AMB-DG converges more than

26: two times.

27:

28: \end{abstract}

29: