20c5bcd178d0b3c5.tex
1: \begin{abstract}
2: Recursive least squares (RLS) algorithms were once widely used for training small-scale neural networks,
3: due to their fast convergence. However, previous RLS algorithms are unsuitable for training deep neural
4: networks (DNNs), since they have high computational complexity and too many preconditions. In this paper,
5: to overcome these drawbacks, we propose three novel RLS optimization algorithms for training feedforward
6: neural networks, convolutional neural networks and recurrent neural networks (including long short-term
7: memory networks), by using the error backpropagation and our
8: average-approximation  RLS method, together with the equivalent gradients of the linear least squares
9: loss function with respect to the linear outputs of hidden layers. Compared with previous RLS optimization
10: algorithms, our algorithms are simple and elegant. They can be viewed as an improved stochastic gradient
11: descent (SGD) algorithm, which uses the inverse autocorrelation matrix of each layer as the adaptive learning rate.
12: Their time and space complexities are only several times those of SGD.
13: They only require the loss function to be the mean squared error and the activation function of the
14: output layer to be invertible. In fact, our algorithms can be also used in combination with other first-order
15: optimization algorithms without requiring these two preconditions. In addition,
16: we present two improved methods for our algorithms. Finally, we
17: demonstrate their effectiveness compared to the Adam algorithm on MNIST, CIFAR-10 and IMDB datasets,
18: and investigate the influences of their hyperparameters experimentally.
19: \end{abstract}
20: