abstract:20c5bcd178d0b3c5.tex

1: \begin{abstract}

2: Recursive least squares (RLS) algorithms were once widely used for training small-scale neural networks,

3: due to their fast convergence. However, previous RLS algorithms are unsuitable for training deep neural

4: networks (DNNs), since they have high computational complexity and too many preconditions. In this paper,

5: to overcome these drawbacks, we propose three novel RLS optimization algorithms for training feedforward

6: neural networks, convolutional neural networks and recurrent neural networks (including long short-term

7: memory networks), by using the error backpropagation and our

8: average-approximation  RLS method, together with the equivalent gradients of the linear least squares

9: loss function with respect to the linear outputs of hidden layers. Compared with previous RLS optimization

10: algorithms, our algorithms are simple and elegant. They can be viewed as an improved stochastic gradient

11: descent (SGD) algorithm, which uses the inverse autocorrelation matrix of each layer as the adaptive learning rate.

12: Their time and space complexities are only several times those of SGD.

13: They only require the loss function to be the mean squared error and the activation function of the

14: output layer to be invertible. In fact, our algorithms can be also used in combination with other first-order

15: optimization algorithms without requiring these two preconditions. In addition,

16: we present two improved methods for our algorithms. Finally, we

17: demonstrate their effectiveness compared to the Adam algorithm on MNIST, CIFAR-10 and IMDB datasets,

18: and investigate the influences of their hyperparameters experimentally.

19: \end{abstract}

20: