a9e839965931e3e9.tex
1: \begin{abstract}
2: Distributed optimization is widely deployed in practice to solve a
3: broad range of problems. In a typical asynchronous scheme, workers
4: calculate gradients with respect to out-of-date optimization parameters
5: while the master uses stale (i.e., delayed) gradients to update the
6: parameters. While using stale gradients can slow the convergence,
7: asynchronous methods speed up the overall optimization with respect
8: to wall clock time by allowing more frequent updates and reducing
9: idling times. In this paper, we present a variable per-epoch minibatch
10: scheme called Anytime Minibatch with Delayed Gradients (AMB-DG). In
11: AMB-DG, workers compute gradients in epochs of a fixed time while
12: the master uses stale gradients to update the optimization parameters.
13: We analyze AMB-DG in terms of its regret bound and convergence rate.
14: We prove that for convex smooth objective functions, AMB-DG achieves
15: the optimal regret bound and convergence rate. We compare the performance
16: of AMB-DG with that of Anytime Minibatch (AMB) which is similar to
17: AMB-DG but does not use stale gradients. In AMB, workers stay idle
18: after each gradient transmission to the master until they receive
19: the updated parameters from the master while in AMB-DG workers never
20: idle. We also extend AMB-DG to the fully distributed setting. We compare
21: AMB-DG with AMB when the communication delay is long and observe that
22: AMB-DG converges faster than AMB in wall clock time. We also compare
23: the performance of AMB-DG with the state-of-the-art fixed minibatch
24: approach that uses delayed gradients. We run our experiments on a
25: real distributed system and observe that AMB-DG converges more than
26: two times.
27: 
28: \end{abstract}
29: