abstract:b322db0af4880509.tex

1: \begin{abstract}

2: Stochastic Gradient Descent (SGD) is a popular algorithm that can  achieve state-of-the-art performance on a variety of machine  learning tasks.  Several researchers have recently proposed schemes  to parallelize SGD, but all require performance-destroying memory locking and synchronization. This work aims to show using novel theoretical analysis, algorithms,  and implementation that SGD can be implemented {\em without any locking}.  We present an update scheme called \name which allows  processors access to shared memory with the possibility of  overwriting each other's work.  We show that when the  associated optimization problem is~\emph{sparse}, meaning most  gradient updates only modify small parts of the decision variable,  then \name achieves a nearly optimal rate of convergence.  We demonstrate experimentally that \name outperforms alternative schemes  that use locking by an order of magnitude.

3: \end{abstract}

4: