abstract:1733704c9cb4c54f.tex

1: \begin{abstract}

2: % \signSGD is an \SGD variant where the gradient is reduced to its sign at each position. In recent years, \signSGD has garnered interest as an ingredient in Adam alternatives, and more generally as a simple model to understand adaptive optimizers. Although there is previous work on convergence guarantees, theoretical understanding of the learning dynamics of \signSGD is scarce. In this work, we study \signSGD in a high-dimensional limit and derive an ODE which describes the risk. Using this ODE we show that \signSGD effectively preconditions the optimization problem, and quantitatively compute the improved convergence rate compared to vanilla \SGD. We also show that \signSGD optimizes better in the presence of heavy-tailed or asymmetric label noise. Our analysis represents the first high-dimensional setting where we can rigorously and quantitatively explore the dynamics of Adam-adjacent adaptive optimizers.

3: % \end{abstract}

4: