abstract:132f74094e692a4b.tex

1: \begin{abstract}

2: The analysis of second-order optimization methods based either on sampling, randomization or sketching has two serious

3: shortcomings compared to the conventional Newton method. The first shortcoming

4: is that the analysis of the iterates is not scale-invariant, and even if it is, restrictive assumptions are required on the

5:  problem structure. The second shortfall is that the fast convergence rates

6: of second-order methods have only been established by making assumptions regarding the input data.

7: % These theoretical shortcomings have severe practical implications too. For medium scale problems,

8: % the new class of second-order methods based on sub-sampling or sketching can be slower than the standard Newton method. For

9:  %large-scale problems, they are slightly faster than stochastic first-order

10: % methods, but in general, they are comparable.

11: In this paper, we close the theoretical gap between the theoretical analysis of the conventional Newton method and

12: randomization-based second-order methods. We propose a Self-concordant Iterative-minimization - Galerkin-based Multilevel

13:  Algorithm (SIGMA) and establish its super-linear convergence rate using the well-established theory of self-concordant functions.

14:  Our analysis is global and entirely

15: independent of unknown constants such as Lipschitz constants and strong convexity parameters. Our analysis is based on the

16:  connections between multigrid optimization methods, and the role of

17: coarse-grained or reduced-order models in the computation of search directions. We take advantage of the insights from the

18:  analysis to significantly improve the performance of second-order

19: methods in machine learning applications. We report encouraging initial experiments that suggest SIGMA significantly outperforms

20:  the state-of-the-art sub-sampled/sketched Newton methods for both medium and large-scale problems.

21: \end{abstract}

22: