1: \begin{abstract}
2:
3: We provide the first theoretical analysis on the convergence rate of asynchronous stochastic gradient descent with variance reduction (AsySVRG) for
4: non-convex optimization. Asynchronous stochastic gradient descent (AsySGD) has been broadly used in solving neural network and it is proved to converge with $O(1/\sqrt{T})$.
5: Recent studies have shown that asynchronous SGD method with variance reduction technique converges
6: with a linear convergence rate on convex problem. However, there is no work to analyze asynchronous SGD with variance reduction technique on non-convex problem.
7: In this paper, we consider two asynchronous parallel implementations of SVRG: one is on distributed-memory architecture and the other is on shared-memory architecture.
8: We prove that both methods can converge with a rate of $O(1/T)$, and a linear speedup is achievable when we increase the number of workers.
9: Experimental results on neural network with real data (MNIST and CIFAR-10) also demonstrate our statements.
10:
11: \end{abstract}
12: