fc8105c1b53c0690.tex
1: \begin{abstract}
2: This paper presents fault-tolerant asynchronous \emph{Stochastic 
3: Gradient Descent} (\emph{SGD}) algorithms.
4: SGD is widely used for approximating the minimum of a cost function $Q$,
5: as a core part of optimization and learning algorithms. 
6: Our algorithms are designed for the \emph{cluster-based} model,
7: which combines message-passing and shared-memory communication layers.
8: Processes may fail by \emph{crashing}, and the algorithm inside 
9: each cluster is \emph{wait-free}, using only reads and writes.
10: 
11: For a \emph{strongly convex} function $Q$,
12: our algorithm \emph{tolerates any number of failures}, 
13: and provides convergence rate
14: that yields the maximal distributed 
15: acceleration over the optimal convergence rate of \emph{sequential} SGD.
16: 
17: For arbitrary functions, the convergence rate has an additional term that
18: depends on the maximal difference between the parameters at the same iteration.
19: (This holds under standard assumptions on $Q$.)
20: In this case, the algorithm obtains the same convergence rate as sequential SGD, 
21: up to a logarithmic factor. 
22: This is achieved by using, at each iteration, a \emph{multidimensional 
23: approximate agreement} algorithm, tailored for the cluster-based model. 
24: 
25: The algorithm for arbitrary functions requires that at 
26: least a majority of the clusters contain at least one nonfaulty process.
27: We prove that this condition is necessary when optimizing some 
28: non-convex functions.
29: \end{abstract}