763df150d02416b7.tex
1: \begin{abstract}
2: 
3:     In this paper, we introduce an accelerated distributed stochastic gradient method with momentum
4:     for solving the distributed optimization problem, where a group of $n$ agents collaboratively minimize the average of the local objective functions over a connected network. 
5:     The method, termed ``Distributed Stochastic Momentum Tracking (DSMT)'', is a single-loop algorithm that utilizes the momentum tracking technique as well as the Loopless Chebyshev Acceleration (LCA) method.
6:     We show that DSMT can asymptotically achieve comparable convergence rates as centralized stochastic gradient descent (SGD) method under a general variance condition regarding the stochastic gradients. Moreover, the number of iterations (transient times) required for DSMT to achieve such rates behaves as $\orderi{n^{5/3}/(1-\lambda)}$ for minimizing general smooth objective functions, and $\orderi{\sqrt{n/(1-\lambda)}}$ under the Polyak-{\L}ojasiewicz (PL) condition. Here, the term $1-\lambda$ denotes the spectral gap of the mixing matrix related to the underlying network topology.
7:     % Furthermore, DSMT achieves the optimal iteration complexity (with no log-dependent terms) when high level of accuracy is required. 
8:     Notably, the obtained results do not rely on multiple inter-node communications or stochastic gradient accumulation per iteration, and the transient times are the shortest under the setting to the best of our knowledge.
9:     % This broadens the practical applicability of distributed stochastic gradient methods.
10: \end{abstract}
11: