e398f126a1e215d0.tex
1: \begin{abstract}
2: With ever growing data volume and model size, an error-tolerant, communication efficient, yet versatile distributed algorithm has become vital for the success of many large-scale machine learning applications. In this work we propose \mspg, an implementation of the flexible proximal gradient algorithm in model parallel systems equipped with the partially asynchronous communication protocol. The worker machines communicate asynchronously with a controlled staleness bound $s$ and operate at different frequencies. We characterize various convergence properties of \mspg: 1) Under a general non-smooth and non-convex setting, we prove that every limit point of the sequence generated by \mspg is a critical point of the objective function; 2) Under an error bound condition, we prove that the function value decays linearly for every $s$ steps; 3) Under the Kurdyka-${\L}$ojasiewicz inequality, we prove that the sequences generated by \mspg converge to the same critical point, provided that a proximal Lipschitz condition is satisfied.
3: \end{abstract}
4: