74e152592dc87d5a.tex
1: \begin{abstract} 
2:   Stochastic gradient descent (SGD) is a well-known method for
3:   regression and classification tasks. However, it is an inherently
4:   sequential algorithm---at each step, the processing of the current
5:   example depends on the parameters learned from the previous
6:   examples. Prior approaches to parallelizing linear learners using
7:   SGD, such as \hogwild and \allreduce, do not honor these
8:   dependencies across threads and thus can potentially suffer poor
9:   convergence rates and/or poor scalability. This paper proposes
10:   \ouralgo, a parallel SGD algorithm that, to a first-order
11:   approximation, retains the sequential semantics of SGD.  Each thread
12:   learns a local model in addition to a \emph{model combiner}, which
13:   allows local models to be combined to produce the same result as
14:   what a sequential SGD would have produced.
15: %This \ouralgo approach is applicable to any linear learner. 
16: This paper evaluates \ouralgo's accuracy and performance on $6$
17: datasets on a shared-memory machine shows up-to $11\times$ speedup over
18: our heavily optimized sequential baseline on $16$ cores and $2.2\times$, on average, faster than \hogwild.
19: \end{abstract}