ba90396b758ce12a.tex
1: \begin{abstract}                % Abstract of not more than 250 words.
2: The distributed nonconvex optimization problem of minimizing a global cost function formed by a sum of $n$ local cost functions by using local information exchange is considered.
3: This problem is an important component of many machine learning techniques with data parallelism, such as deep learning and federated learning.
4: We propose a distributed primal--dual stochastic gradient descent (SGD) algorithm,  suitable for arbitrarily connected communication networks and any smooth (possibly nonconvex) cost functions.
5: We show that the proposed algorithm achieves the linear speedup convergence rate $\mathcal{O}(1/\sqrt{nT})$ for general nonconvex cost functions and the linear speedup convergence rate $\mathcal{O}(1/(nT))$ when the global cost function satisfies the Polyak--{\L}ojasiewicz (P--{\L}) condition, where $T$ is the total number of iterations.
6: We also show that the output of the proposed algorithm with constant parameters linearly converges to a neighborhood of a global optimum.
7: We demonstrate through numerical experiments the efficiency of our algorithm in comparison with the baseline centralized SGD and recently proposed distributed SGD algorithms.
8: 
9: 
10: \emph{Index Terms}---Distributed nonconvex optimization, linear speedup, Polyak--{\L}ojasiewicz condition, primal--dual algorithm, stochastic gradient descent
11: \end{abstract}
12: