abstract:68d76216f0986cba.tex

1: \begin{abstract}

2: 	To increase the training speed of distributed learning, recent years have witnessed a significant amount of interest in developing both synchronous and asynchronous distributed stochastic variance-reduced optimization methods.

3: 	However, all existing synchronous and asynchronous distributed training algorithms suffer from various limitations in either convergence speed or implementation complexity.

4: 	This motivates us to propose an algorithm called \algname (\ul{s}emi-as\ul{yn}chronous pa\ul{th}-int\ul{e}grated \ul{s}tochastic grad\ul{i}ent \ul{s}earch), which leverages the special structure of the variance-reduction framework to overcome the limitations of both synchronous and asynchronous distributed learning algorithms, while retaining their salient features.

5: 	We consider two implementations of \algname under distributed and shared memory architectures.

6: 	We show that our \algname algorithms have \(O(\sqrt{N}\epsilon^{-2}(\Delta+1)+N)\) and \(O(\sqrt{N}\epsilon^{-2}(\Delta+1) d+N)\) computational complexities for achieving an \(\epsilon\)-stationary point in non-convex learning under distributed and shared memory architectures, respectively, where \(N\) denotes the total number of training samples and \(\Delta\) represents the maximum delay of the workers.

7: 	Moreover, we investigate the generalization performance of \algname by establishing algorithmic stability bounds for quadratic strongly convex and non-convex optimization.

8: 	We further conduct extensive numerical experiments to verify our theoretical findings.

9:

10:

11: \end{abstract}

12: