abstract:b4606484de3ef790.tex

1: \begin{abstract}

2:

3: We provide the first theoretical analysis on the convergence rate of asynchronous stochastic gradient descent with variance reduction (AsySVRG) for

4: non-convex optimization. Asynchronous stochastic gradient descent (AsySGD) has been broadly used in solving neural network and it is proved to converge with $O(1/\sqrt{T})$.

5: Recent studies have shown that  asynchronous SGD method with variance reduction technique converges

6: with a linear convergence rate on convex problem. However, there is no work to analyze asynchronous SGD with variance reduction technique on non-convex problem.

7: In this paper, we consider two asynchronous parallel implementations of SVRG: one is on distributed-memory architecture and the other is on shared-memory architecture.

8: We prove that both methods can converge with a rate of $O(1/T)$, and a linear speedup is achievable when we increase the number of workers.

9: Experimental results on neural network with real data (MNIST and CIFAR-10) also demonstrate our statements.

10:

11: \end{abstract}

12: