1: \begin{abstract}%
2: In this work, we study an optimizer, \texttt{Grad-Avg} to optimize error functions. We establish the convergence of the sequence of iterates of \texttt{Grad-Avg} mathematically to a minimizer (under boundedness assumption). We apply \texttt{Grad-Avg} along with some of the popular optimizers on regression as well as classification tasks. In regression tasks, it is observed that the behaviour of \texttt{Grad-Avg} is almost identical with Stochastic Gradient Descent (SGD). We present a mathematical justification of this fact. In case of classification tasks, it is observed that the performance of \texttt{Grad-Avg} can be enhanced by suitably scaling the parameters. Experimental results demonstrate that \texttt{Grad-Avg} converges faster than the other state-of-the-art optimizers for the classification task on two benchmark datasets.
3: \end{abstract}
4: