abstract:74e152592dc87d5a.tex

1: \begin{abstract}

2:   Stochastic gradient descent (SGD) is a well-known method for

3:   regression and classification tasks. However, it is an inherently

4:   sequential algorithm---at each step, the processing of the current

5:   example depends on the parameters learned from the previous

6:   examples. Prior approaches to parallelizing linear learners using

7:   SGD, such as \hogwild and \allreduce, do not honor these

8:   dependencies across threads and thus can potentially suffer poor

9:   convergence rates and/or poor scalability. This paper proposes

10:   \ouralgo, a parallel SGD algorithm that, to a first-order

11:   approximation, retains the sequential semantics of SGD.  Each thread

12:   learns a local model in addition to a \emph{model combiner}, which

13:   allows local models to be combined to produce the same result as

14:   what a sequential SGD would have produced.

15: %This \ouralgo approach is applicable to any linear learner.

16: This paper evaluates \ouralgo's accuracy and performance on $6$

17: datasets on a shared-memory machine shows up-to $11\times$ speedup over

18: our heavily optimized sequential baseline on $16$ cores and $2.2\times$, on average, faster than \hogwild.

19: \end{abstract}