abstract:1565000f86f46792.tex

1: \begin{abstract}

2: % Recurrent Neural Network~(RNN) has made great progress in various fields recently.

3: % However,

4: It is hard to train Recurrent Neural Network~(RNN) with stable convergence and avoid gradient \textit{vanishing} and \textit{exploding}, as the weights in the recurrent unit are repeated from iteration to iteration.

5: Moreover, RNN is sensitive to the initialization of weights and bias, which brings difficulty in the training phase.

6: With the gradient-free feature and immunity to poor conditions, the Alternating Direction Method of Multipliers~(ADMM) has become a promising algorithm to train neural networks beyond traditional stochastic gradient algorithms.

7: However, ADMM could not be applied to train RNN directly since the state in the recurrent unit is repetitively updated over timesteps.

8: Therefore, this work builds a new framework named ADMMiRNN upon the unfolded form of RNN to address the above challenges simultaneously and provides novel update rules and theoretical convergence analysis.

9: We explicitly specify key update rules in the iterations of ADMMiRNN with deliberately constructed approximation techniques and solutions to each sub-problem instead of vanilla ADMM.

10: Numerical experiments are conducted on MNIST and text classification tasks, where ADMMiRNN achieves convergent results and outperforms compared baselines.

11: Furthermore, ADMMiRNN trains RNN in a more stable way without gradient vanishing or exploding compared to the stochastic gradient algorithms.

12: %  than them, %

13: Source code has been available at \href{https://github.com/TonyTangYu/ADMMiRNN}{https://github.com/TonyTangYu/ADMMiRNN}.

14:

15:

16: %\keywords{RNN \and ADMM\and Convergence\and Stability.}

17:

18: \end{abstract}

19: