abstract:ad87578b343fb249.tex

1: \begin{abstract}%

2: Preconditioned gradient methods are among the most general and powerful tools

3: in optimization. However, preconditioning requires storing and manipulating

4: prohibitively large matrices. We describe and analyze a new structure-aware

5: preconditioning algorithm, called \NAME, for stochastic optimization over

6: tensor spaces. \NAME maintains a set of preconditioning matrices, each of

7: which operates on a single dimension, contracting over the remaining

8: dimensions. We establish convergence guarantees in the stochastic convex

9: setting, the proof of which builds upon matrix trace inequalities.  Our

10: experiments with state-of-the-art deep learning models show that \NAME is

11: capable of converging considerably faster than commonly used optimizers.

12: Although it involves a more complex update rule, \NAME's runtime per step is

13: comparable to that of simple gradient methods such as SGD, AdaGrad, and Adam.%

14: \end{abstract}

15: