abstract:1b9c2befcfe66dbf.tex

1: \begin{abstract}

2: Stochastic gradient descent (SGD) is a ubiquitous algorithm for a

3: variety of machine learning problems. Researchers and industry have

4: developed several techniques to optimize SGD's runtime performance,

5: including asynchronous execution and reduced precision. Our main

6: result is a martingale-based analysis that enables us to capture

7: the rich

8: noise models that may arise from such techniques. Specifically, we use

9: our new analysis in three ways: (1) we derive convergence rates for

10: the convex case (\hogwild) with relaxed assumptions on the sparsity of

11: the problem; (2) we analyze asynchronous SGD algorithms for

12: non-convex matrix problems including matrix completion; and (3) we

13: design and analyze an asynchronous SGD algorithm, called \buckwild,

14: that uses lower-precision arithmetic. We show experimentally that our

15: algorithms run efficiently for a variety of problems on modern

16: hardware.

17: \end{abstract}

18: