ac8c020bc2ba3524.tex
1: \begin{abstract}
2: This paper investigates accelerating the convergence of distributed optimization algorithms on non-convex problems.
3: We propose a distributed primal-dual stochastic gradient descent~(SGD) equipped with ``powerball'' method to accelerate.
4: We show that the proposed algorithm achieves the linear speedup convergence rate $\mathcal{O}(1/\sqrt{nT})$ for general smooth (possibly non-convex) cost functions.
5: We demonstrate the efficiency of the algorithm through numerical experiments by training two-layer fully connected neural networks and convolutional neural networks on the MNIST dataset to compare with state-of-the-art distributed SGD algorithms and centralized SGD algorithms.
6: \end{abstract}
7: