1: \begin{abstract}
2: This paper presents fault-tolerant asynchronous \emph{Stochastic
3: Gradient Descent} (\emph{SGD}) algorithms.
4: SGD is widely used for approximating the minimum of a cost function $Q$,
5: as a core part of optimization and learning algorithms.
6: Our algorithms are designed for the \emph{cluster-based} model,
7: which combines message-passing and shared-memory communication layers.
8: Processes may fail by \emph{crashing}, and the algorithm inside
9: each cluster is \emph{wait-free}, using only reads and writes.
10:
11: For a \emph{strongly convex} function $Q$,
12: our algorithm \emph{tolerates any number of failures},
13: and provides convergence rate
14: that yields the maximal distributed
15: acceleration over the optimal convergence rate of \emph{sequential} SGD.
16:
17: For arbitrary functions, the convergence rate has an additional term that
18: depends on the maximal difference between the parameters at the same iteration.
19: (This holds under standard assumptions on $Q$.)
20: In this case, the algorithm obtains the same convergence rate as sequential SGD,
21: up to a logarithmic factor.
22: This is achieved by using, at each iteration, a \emph{multidimensional
23: approximate agreement} algorithm, tailored for the cluster-based model.
24:
25: The algorithm for arbitrary functions requires that at
26: least a majority of the clusters contain at least one nonfaulty process.
27: We prove that this condition is necessary when optimizing some
28: non-convex functions.
29: \end{abstract}