abstract:1b050d273b6cb2bd.tex

1: \begin{abstract}

2: Recurrent neural networks can be difficult to train on long sequence data due to the well-known vanishing gradient problem.

3: Some architectures incorporate methods to reduce RNN state updates, thereby allowing the network to preserve memory over long temporal intervals.

4: We propose a  timing-gated LSTM RNN model, called the Gaussian-gated LSTM (g-LSTM) for reducing state updates.

5: The time gate controls when a neuron can be updated during training, enabling longer memory persistence and better error-gradient flow.

6: This model captures long temporal dependencies better than an LSTM on very long sequence tasks and the time gate parameters can be learned even from a non-optimal initialization.

7: Because the time gate limits the updates of the neuron state, the number of computes needed for the network update is also reduced.

8: By adding a computational budget term to the training loss, we obtain a network which further reduces the number of computes by at least $10\times$.

9: Finally, we propose a temporal curriculum learning schedule for the g-LSTM that helps speed up the convergence time of the equivalent LSTM on long sequences.

10:

11: \end{abstract}

12: