1: \begin{abstract}
2: Most machine learning and deep neural network algorithms rely on certain iterative algorithms to optimise their utility/cost functions, e.g. Stochastic Gradient Descent (SGD). In distributed learning, the networked nodes have to work collaboratively to update the model parameters, and the way how they proceed is referred to as synchronous parallel design (or barrier control). Synchronous parallel protocol is practically the building block of all distributed learning frameworks, and its design has direct impact on the performance and scalability of the system.
3:
4: In this paper, we propose a new barrier control technique - Probabilistic Synchronous Parallel (PSP). Comparing to the previous Bulk Synchronous Parallel (BSP), Stale Synchronous Parallel (SSP), and (Asynchronous Parallel) ASP, the proposed solution effectively improves both the convergence speed and the scalability of the SGD algorithm by introducing a sampling primitive into the system. Moreover, we also show that the sampling primitive can be composed with the existing barrier control mechanisms to derive fully distributed PSP-based synchronous parallel.
5:
6: We not only provide a thorough theoretical analysis\footnote{Most of the theoretical analysis was part of Ben Catterall's master thesis on his part III project,in the Computer Lab, at the University of Cambridge, in 2017.} on the convergence guarantee of PSP-based SGD algorithm, but also implement a full-featured distributed learning framework called Actor System and perform intensive evaluation atop of it.
7: \end{abstract}
8: