abstract:b66a73a84f6e57b9.tex

1: \begin{abstract}

2: Stochastic gradient descent with momentum (SGDM), which is defined by adding a momentum term to SGD, has been well studied in both theory and practice.

3: Theoretically investigated results showed that the settings of the learning rate and momentum weight affect the convergence of SGDM.

4: Meanwhile, practical results showed that the setting of batch size strongly depends on the performance of SGDM.

5: In this paper, we focus on mini-batch SGDM with constant learning rate and constant momentum weight, which is frequently used to train deep neural networks in practice.

6: The contribution of this paper is showing theoretically that using a constant batch size does not always minimize the expectation of the full gradient norm of the empirical loss in training a deep neural network, whereas using an increasing batch size definitely minimizes it, that is, increasing batch size improves convergence of mini-batch SGDM.

7: We also provide numerical results supporting our analyses, indicating specifically that mini-batch SGDM with an increasing batch size converges to stationary points faster than with a constant batch size.

8: Python implementations of the optimizers used in the numerical experiments are available at \url{https://anonymous.4open.science/r/momentum-increasing-batch-size-888C/}.

9: \end{abstract}

10: