6dcc56657986a9e6.tex
1: \begin{abstract}
2:   Asynchronous parallel optimization algorithms for solving large-scale machine
3:   learning problems have drawn significant attention from academia to industry
4:   recently. This paper proposes a novel algorithm, decoupled asynchronous
5:   proximal stochastic gradient descent (DAP-SGD), to minimize an
6:   objective function that is the composite of the average of multiple empirical
7:   losses and a regularization term. Unlike the traditional asynchronous
8:   proximal stochastic gradient descent (TAP-SGD) in which the
9:   master carries much of the computation load, the proposed algorithm off-loads the
10:   majority of computation tasks from the master to workers, and leaves
11:   the master to conduct simple addition operations. This strategy
12:   yields an easy-to-parallelize algorithm, whose performance is
13:   justified by theoretical convergence analyses. To be specific,
14:   DAP-SGD achieves an $O(\log T/T)$ rate when the step-size is
15:   diminishing and an ergodic $O(1/\sqrt{T})$ rate when the step-size is
16:   constant, where $T$ is the number of total iterations.
17: \end{abstract}
18: