abstract:7e20e4138430eee7.tex

1: \begin{abstract}

2: Recurrent Neural Networks (RNNs) are powerful models for sequential data that

3: have the potential to learn long-term dependencies. However, they are computationally

4: expensive to train and difficult to parallelize. Recent work has shown that

5: normalizing intermediate representations of neural networks can significantly

6: improve convergence rates in feedforward neural networks \cite{ioffe2015}. In

7: particular, batch normalization, which uses mini-batch statistics to standardize

8: features, was shown to significantly reduce training time. In this paper, we

9: show that applying batch normalization to the hidden-to-hidden transitions of our

10: RNNs doesn't help the training procedure. We also show that when applied to the

11: input-to-hidden transitions, batch normalization can lead to a faster convergence of the

12: training criterion but doesn't seem to improve the generalization performance

13: on both our language modelling and speech recognition tasks.

14: All in all, applying batch normalization to RNNs turns out to be more

15: challenging than applying it to feedforward networks, but certain variants of it

16: can still be beneficial.

17: \end{abstract}

18: