1: \begin{abstract}
2: We propose a novel training method based on nonlinear multilevel minimization techniques, commonly used for solving discretized large scale partial differential equations.
3: Our multilevel training method constructs a multilevel hierarchy by reducing the number of samples.
4: The training of the original model is then enhanced by internally training surrogate models constructed with fewer samples.
5: We construct the surrogate models using first-order consistency approach.
6: This gives rise to surrogate models, whose gradients are stochastic estimators of the full gradient, but with reduced variance compared to standard stochastic gradient estimators.
7: We illustrate the convergence behavior of the proposed multilevel method to machine learning applications based on logistic regression.
8: A comparison with subsampled Newton's and variance reduction methods demonstrate the efficiency of our multilevel method.
9: \end{abstract}
10: