abstract:fc8105c1b53c0690.tex

1: \begin{abstract}

2: This paper presents fault-tolerant asynchronous \emph{Stochastic

3: Gradient Descent} (\emph{SGD}) algorithms.

4: SGD is widely used for approximating the minimum of a cost function $Q$,

5: as a core part of optimization and learning algorithms.

6: Our algorithms are designed for the \emph{cluster-based} model,

7: which combines message-passing and shared-memory communication layers.

8: Processes may fail by \emph{crashing}, and the algorithm inside

9: each cluster is \emph{wait-free}, using only reads and writes.

10:

11: For a \emph{strongly convex} function $Q$,

12: our algorithm \emph{tolerates any number of failures},

13: and provides convergence rate

14: that yields the maximal distributed

15: acceleration over the optimal convergence rate of \emph{sequential} SGD.

16:

17: For arbitrary functions, the convergence rate has an additional term that

18: depends on the maximal difference between the parameters at the same iteration.

19: (This holds under standard assumptions on $Q$.)

20: In this case, the algorithm obtains the same convergence rate as sequential SGD,

21: up to a logarithmic factor.

22: This is achieved by using, at each iteration, a \emph{multidimensional

23: approximate agreement} algorithm, tailored for the cluster-based model.

24:

25: The algorithm for arbitrary functions requires that at

26: least a majority of the clusters contain at least one nonfaulty process.

27: We prove that this condition is necessary when optimizing some

28: non-convex functions.

29: \end{abstract}