1: \begin{abstract}
2: Stochastic gradient descent (SGD) is a well-known method for
3: regression and classification tasks. However, it is an inherently
4: sequential algorithm---at each step, the processing of the current
5: example depends on the parameters learned from the previous
6: examples. Prior approaches to parallelizing linear learners using
7: SGD, such as \hogwild and \allreduce, do not honor these
8: dependencies across threads and thus can potentially suffer poor
9: convergence rates and/or poor scalability. This paper proposes
10: \ouralgo, a parallel SGD algorithm that, to a first-order
11: approximation, retains the sequential semantics of SGD. Each thread
12: learns a local model in addition to a \emph{model combiner}, which
13: allows local models to be combined to produce the same result as
14: what a sequential SGD would have produced.
15: %This \ouralgo approach is applicable to any linear learner.
16: This paper evaluates \ouralgo's accuracy and performance on $6$
17: datasets on a shared-memory machine shows up-to $11\times$ speedup over
18: our heavily optimized sequential baseline on $16$ cores and $2.2\times$, on average, faster than \hogwild.
19: \end{abstract}