1: \begin{abstract}
2: We propose a reparameterization of LSTM that brings the benefits of batch normalization to recurrent neural networks.
3: Whereas previous works only apply batch normalization to the input-to-hidden transformation of RNNs,
4: we demonstrate that it is both possible and beneficial to batch-normalize the hidden-to-hidden transition,
5: thereby reducing internal covariate shift between time steps.
6:
7: We evaluate our proposal on various sequential problems such as sequence classification, language modeling and question answering.
8: Our empirical results show that our batch-normalized LSTM consistently leads to faster convergence and improved generalization.
9: \end{abstract}
10: