91f58b5cd320221f.tex
1: \begin{abstract}
2: We present a novel, fast (exponential rate adaption), ab initio (hyper-parameter-free) gradient based optimizer algorithm. The main idea of the method is to adapt the learning rate $ \alpha $ by situational awareness, mainly striving for orthogonal neighboring gradients. The method has a high success and fast convergence rate and does not rely on hand-tuned parameters giving it greater universality. It can be applied to problems of any dimensions $n$ and scales only linearly (of order $O(n)$) with the dimension of the problem. It optimizes convex and non-convex continuous landscapes providing some kind of gradient. In contrast to the Ada-family (AdaGrad, AdaMax, AdaDelta, Adam, etc.) the method is rotation invariant: optimization path and performance are independent of coordinate choices. The impressive performance is demonstrated by extensive experiments on the MNIST benchmark data-set against state-of-the-art optimizers.
3: We name this new class of optimizers after its core idea \textbf{E}xponential \textbf{L}earning \textbf{R}ate \textbf{A}daption -- \textbf{ELRA}. We present it in two variants c2min and p2min with slightly different control.
4: The authors strongly believe that ELRA will open a completely new research direction for gradient descent optimizers.
5: %Intrinsic damping of fast oscillations is a welcome side effect, which improves speed and stability. For rare events of instability in artificial malicious test cases, we introduced self healing rate limitation and failed attempt step shortening.
6: \end{abstract}
7: