1: \begin{abstract}
2: Stochastic gradient descent (SGD) is a ubiquitous algorithm for a
3: variety of machine learning problems. Researchers and industry have
4: developed several techniques to optimize SGD's runtime performance,
5: including asynchronous execution and reduced precision. Our main
6: result is a martingale-based analysis that enables us to capture
7: the rich
8: noise models that may arise from such techniques. Specifically, we use
9: our new analysis in three ways: (1) we derive convergence rates for
10: the convex case (\hogwild) with relaxed assumptions on the sparsity of
11: the problem; (2) we analyze asynchronous SGD algorithms for
12: non-convex matrix problems including matrix completion; and (3) we
13: design and analyze an asynchronous SGD algorithm, called \buckwild,
14: that uses lower-precision arithmetic. We show experimentally that our
15: algorithms run efficiently for a variety of problems on modern
16: hardware.
17: \end{abstract}
18: