1: \begin{abstract}
2: Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve state-of-the-art performance on a variety of machine learning tasks. Several researchers have recently proposed schemes to parallelize SGD, but all require performance-destroying memory locking and synchronization. This work aims to show using novel theoretical analysis, algorithms, and implementation that SGD can be implemented {\em without any locking}. We present an update scheme called \name which allows processors access to shared memory with the possibility of overwriting each other's work. We show that when the associated optimization problem is~\emph{sparse}, meaning most gradient updates only modify small parts of the decision variable, then \name achieves a nearly optimal rate of convergence. We demonstrate experimentally that \name outperforms alternative schemes that use locking by an order of magnitude.
3: \end{abstract}
4: