68d76216f0986cba.tex
1: \begin{abstract}
2: 	To increase the training speed of distributed learning, recent years have witnessed a significant amount of interest in developing both synchronous and asynchronous distributed stochastic variance-reduced optimization methods.
3: 	However, all existing synchronous and asynchronous distributed training algorithms suffer from various limitations in either convergence speed or implementation complexity.
4: 	This motivates us to propose an algorithm called \algname (\ul{s}emi-as\ul{yn}chronous pa\ul{th}-int\ul{e}grated \ul{s}tochastic grad\ul{i}ent \ul{s}earch), which leverages the special structure of the variance-reduction framework to overcome the limitations of both synchronous and asynchronous distributed learning algorithms, while retaining their salient features.
5: 	We consider two implementations of \algname under distributed and shared memory architectures. 
6: 	We show that our \algname algorithms have \(O(\sqrt{N}\epsilon^{-2}(\Delta+1)+N)\) and \(O(\sqrt{N}\epsilon^{-2}(\Delta+1) d+N)\) computational complexities for achieving an \(\epsilon\)-stationary point in non-convex learning under distributed and shared memory architectures, respectively, where \(N\) denotes the total number of training samples and \(\Delta\) represents the maximum delay of the workers. 
7: 	Moreover, we investigate the generalization performance of \algname by establishing algorithmic stability bounds for quadratic strongly convex and non-convex optimization. 
8: 	We further conduct extensive numerical experiments to verify our theoretical findings.
9: 	
10: 
11: \end{abstract}
12: