1: \begin{abstract}
2: The analysis of second-order optimization methods based either on sampling, randomization or sketching has two serious
3: shortcomings compared to the conventional Newton method. The first shortcoming
4: is that the analysis of the iterates is not scale-invariant, and even if it is, restrictive assumptions are required on the
5: problem structure. The second shortfall is that the fast convergence rates
6: of second-order methods have only been established by making assumptions regarding the input data.
7: % These theoretical shortcomings have severe practical implications too. For medium scale problems,
8: % the new class of second-order methods based on sub-sampling or sketching can be slower than the standard Newton method. For
9: %large-scale problems, they are slightly faster than stochastic first-order
10: % methods, but in general, they are comparable.
11: In this paper, we close the theoretical gap between the theoretical analysis of the conventional Newton method and
12: randomization-based second-order methods. We propose a Self-concordant Iterative-minimization - Galerkin-based Multilevel
13: Algorithm (SIGMA) and establish its super-linear convergence rate using the well-established theory of self-concordant functions.
14: Our analysis is global and entirely
15: independent of unknown constants such as Lipschitz constants and strong convexity parameters. Our analysis is based on the
16: connections between multigrid optimization methods, and the role of
17: coarse-grained or reduced-order models in the computation of search directions. We take advantage of the insights from the
18: analysis to significantly improve the performance of second-order
19: methods in machine learning applications. We report encouraging initial experiments that suggest SIGMA significantly outperforms
20: the state-of-the-art sub-sampled/sketched Newton methods for both medium and large-scale problems.
21: \end{abstract}
22: