1: \begin{abstract}
2: To increase the training speed of distributed learning, recent years have witnessed a significant amount of interest in developing both synchronous and asynchronous distributed stochastic variance-reduced optimization methods.
3: However, all existing synchronous and asynchronous distributed training algorithms suffer from various limitations in either convergence speed or implementation complexity.
4: This motivates us to propose an algorithm called \algname (\ul{s}emi-as\ul{yn}chronous pa\ul{th}-int\ul{e}grated \ul{s}tochastic grad\ul{i}ent \ul{s}earch), which leverages the special structure of the variance-reduction framework to overcome the limitations of both synchronous and asynchronous distributed learning algorithms, while retaining their salient features.
5: We consider two implementations of \algname under distributed and shared memory architectures.
6: We show that our \algname algorithms have \(O(\sqrt{N}\epsilon^{-2}(\Delta+1)+N)\) and \(O(\sqrt{N}\epsilon^{-2}(\Delta+1) d+N)\) computational complexities for achieving an \(\epsilon\)-stationary point in non-convex learning under distributed and shared memory architectures, respectively, where \(N\) denotes the total number of training samples and \(\Delta\) represents the maximum delay of the workers.
7: Moreover, we investigate the generalization performance of \algname by establishing algorithmic stability bounds for quadratic strongly convex and non-convex optimization.
8: We further conduct extensive numerical experiments to verify our theoretical findings.
9:
10:
11: \end{abstract}
12: