1: \begin{abstract}
2: Asynchronous parallel optimization algorithms for solving large-scale machine
3: learning problems have drawn significant attention from academia to industry
4: recently. This paper proposes a novel algorithm, decoupled asynchronous
5: proximal stochastic gradient descent (DAP-SGD), to minimize an
6: objective function that is the composite of the average of multiple empirical
7: losses and a regularization term. Unlike the traditional asynchronous
8: proximal stochastic gradient descent (TAP-SGD) in which the
9: master carries much of the computation load, the proposed algorithm off-loads the
10: majority of computation tasks from the master to workers, and leaves
11: the master to conduct simple addition operations. This strategy
12: yields an easy-to-parallelize algorithm, whose performance is
13: justified by theoretical convergence analyses. To be specific,
14: DAP-SGD achieves an $O(\log T/T)$ rate when the step-size is
15: diminishing and an ergodic $O(1/\sqrt{T})$ rate when the step-size is
16: constant, where $T$ is the number of total iterations.
17: \end{abstract}
18: