abstract:0b5658e3101db14c.tex

1: \begin{abstract}

2: We propose a reparameterization of LSTM that brings the benefits of batch normalization to recurrent neural networks.

3: Whereas previous works only apply batch normalization to the input-to-hidden transformation of RNNs,

4: we demonstrate that it is both possible and beneficial to batch-normalize the hidden-to-hidden transition,

5: thereby reducing internal covariate shift between time steps.

6:

7: We evaluate our proposal on various sequential problems such as sequence classification, language modeling and question answering.

8: Our empirical results show that our batch-normalized LSTM consistently leads to faster convergence and improved generalization.

9: \end{abstract}

10: