abstract:99421c2ea493450b.tex

1: \begin{abstract}

2: % Stochastic gradient descent (\SGD) is the method of choice for large-scale machine learning problems,

3: % by virtue of its light complexity per iteration.

4: % However, it lags behind its non-stochastic counterparts with respect to convergence rate,

5: % due to high variance introduced by the stochastic gradient steps.

6: % To mitigate this shortcoming,

7: % \cite{johnson2013accelerating} occasionally uses full gradient information.

8: % %to reduce the variance of the stochastic steps.

9: % Yet, even an infrequent computation of the full gradient can be

10: % prohibitive in the large scale setting.

11:

12: % Prompted by such limitations, we ponder the question of

13: % %we examine the computational resource allocation in \SGD-based algorithms:

14: % how we should allocate a limited budget of atomic gradient calculations

15: % over iterations in stochastic schemes.

16: % We propose \algo, a Biased and Variance-Reduction \SGD~scheme that achieves linear convergence rate in function values, up to some error level.

17: % %requiring fewer resources per \textcolor{red}{epoch} compared to existing variance reduction methods.

18: % % which employ computations of the full gradient.

19: % In contrast to existing approaches, our scheme

20: % %computes biased estimators of the full gradient in each epoch

21: % utilizes only a subset of the input data for full gradient approximation,

22: % featuring a trade-off between the computational complexity and the convergence rate.

23: % %We supplement our theoretical results with

24: % Empirical evaluation shows that our algorithm performs at least competitively compared to state-of-the-art approaches.

25: % %Finally, we back up our findings with rigorous theoretical guarantees: \algo~admits linear convergence rate for smooth and strongly convex objectives, as its predecessors.

26: % \end{abstract}

27: