abstract:6dcc56657986a9e6.tex

1: \begin{abstract}

2:   Asynchronous parallel optimization algorithms for solving large-scale machine

3:   learning problems have drawn significant attention from academia to industry

4:   recently. This paper proposes a novel algorithm, decoupled asynchronous

5:   proximal stochastic gradient descent (DAP-SGD), to minimize an

6:   objective function that is the composite of the average of multiple empirical

7:   losses and a regularization term. Unlike the traditional asynchronous

8:   proximal stochastic gradient descent (TAP-SGD) in which the

9:   master carries much of the computation load, the proposed algorithm off-loads the

10:   majority of computation tasks from the master to workers, and leaves

11:   the master to conduct simple addition operations. This strategy

12:   yields an easy-to-parallelize algorithm, whose performance is

13:   justified by theoretical convergence analyses. To be specific,

14:   DAP-SGD achieves an $O(\log T/T)$ rate when the step-size is

15:   diminishing and an ergodic $O(1/\sqrt{T})$ rate when the step-size is

16:   constant, where $T$ is the number of total iterations.

17: \end{abstract}

18: