1: \begin{abstract}
2: In second-order optimization, a potential bottleneck can be computing the
3: Hessian matrix of the optimized function at every iteration. Randomized
4: sketching has emerged as a powerful technique for constructing
5: estimates of the Hessian which can be used to perform approximate
6: Newton steps. This involves multiplication by a
7: random sketching matrix, which introduces a trade-off between the
8: computational cost of sketching and the convergence rate of the
9: optimization algorithm. A theoretically desirable but practically much too
10: expensive choice is to use a dense Gaussian sketching matrix, which
11: produces unbiased estimates of the exact Newton step and which offers strong
12: problem-independent convergence guarantees. We show that the Gaussian
13: sketching matrix can be drastically sparsified, significantly reducing the
14: computational cost of sketching, without substantially affecting its convergence
15: properties. This approach, called Newton-LESS, is based on a recently introduced
16: sketching technique: LEverage Score Sparsified (LESS)
17: embeddings. We prove that Newton-LESS enjoys nearly the same
18: problem-independent local convergence rate as Gaussian embeddings, not
19: just up to constant factors but even down to lower order terms, for
20: a large class of optimization tasks. In
21: particular, this leads to a new state-of-the-art convergence result for
22: an iterative least squares solver. Finally,
23: we extend LESS embeddings to include uniformly sparsified random sign matrices
24: which can be implemented efficiently and which perform well in numerical experiments.
25: \end{abstract}
26: