ec578a36f0804b35.tex
1: \begin{abstract}
2: We propose a novel training method based on nonlinear multilevel minimization techniques, commonly used for solving discretized large scale partial differential equations.   
3: Our multilevel training method constructs a multilevel hierarchy by reducing the number of samples. 
4: The training of the original model is then enhanced by internally training surrogate models constructed with fewer samples. 
5: We construct the surrogate models using first-order consistency approach. 
6: This gives rise to surrogate models, whose gradients are stochastic estimators of the full gradient, but with reduced variance compared to standard stochastic gradient estimators. 
7: We illustrate the convergence behavior of the proposed multilevel method to machine learning applications based on logistic regression. 
8: A comparison with subsampled Newton's and variance reduction methods demonstrate the efficiency of our multilevel method. 
9: \end{abstract}
10: