abstract:ba90396b758ce12a.tex

1: \begin{abstract}                % Abstract of not more than 250 words.

2: The distributed nonconvex optimization problem of minimizing a global cost function formed by a sum of $n$ local cost functions by using local information exchange is considered.

3: This problem is an important component of many machine learning techniques with data parallelism, such as deep learning and federated learning.

4: We propose a distributed primal--dual stochastic gradient descent (SGD) algorithm,  suitable for arbitrarily connected communication networks and any smooth (possibly nonconvex) cost functions.

5: We show that the proposed algorithm achieves the linear speedup convergence rate $\mathcal{O}(1/\sqrt{nT})$ for general nonconvex cost functions and the linear speedup convergence rate $\mathcal{O}(1/(nT))$ when the global cost function satisfies the Polyak--{\L}ojasiewicz (P--{\L}) condition, where $T$ is the total number of iterations.

6: We also show that the output of the proposed algorithm with constant parameters linearly converges to a neighborhood of a global optimum.

7: We demonstrate through numerical experiments the efficiency of our algorithm in comparison with the baseline centralized SGD and recently proposed distributed SGD algorithms.

8:

9:

10: \emph{Index Terms}---Distributed nonconvex optimization, linear speedup, Polyak--{\L}ojasiewicz condition, primal--dual algorithm, stochastic gradient descent

11: \end{abstract}

12: