99421c2ea493450b.tex
1: \begin{abstract}
2: % Stochastic gradient descent (\SGD) is the method of choice for large-scale machine learning problems, 
3: % by virtue of its light complexity per iteration.
4: % However, it lags behind its non-stochastic counterparts with respect to convergence rate,
5: % due to high variance introduced by the stochastic gradient steps.
6: % To mitigate this shortcoming,
7: % \cite{johnson2013accelerating} occasionally uses full gradient information.
8: % %to reduce the variance of the stochastic steps.
9: % Yet, even an infrequent computation of the full gradient can be 
10: % prohibitive in the large scale setting.
11: 
12: % Prompted by such limitations, we ponder the question of 
13: % %we examine the computational resource allocation in \SGD-based algorithms:
14: % how we should allocate a limited budget of atomic gradient calculations
15: % over iterations in stochastic schemes.
16: % We propose \algo, a Biased and Variance-Reduction \SGD~scheme that achieves linear convergence rate in function values, up to some error level.
17: % %requiring fewer resources per \textcolor{red}{epoch} compared to existing variance reduction methods.
18: % % which employ computations of the full gradient.
19: % In contrast to existing approaches, our scheme 
20: % %computes biased estimators of the full gradient in each epoch 
21: % utilizes only a subset of the input data for full gradient approximation,
22: % featuring a trade-off between the computational complexity and the convergence rate.
23: % %We supplement our theoretical results with 
24: % Empirical evaluation shows that our algorithm performs at least competitively compared to state-of-the-art approaches.
25: % %Finally, we back up our findings with rigorous theoretical guarantees: \algo~admits linear convergence rate for smooth and strongly convex objectives, as its predecessors. 
26: % \end{abstract}
27: