1b050d273b6cb2bd.tex
1: \begin{abstract}
2: Recurrent neural networks can be difficult to train on long sequence data due to the well-known vanishing gradient problem.
3: Some architectures incorporate methods to reduce RNN state updates, thereby allowing the network to preserve memory over long temporal intervals.
4: We propose a  timing-gated LSTM RNN model, called the Gaussian-gated LSTM (g-LSTM) for reducing state updates.
5: The time gate controls when a neuron can be updated during training, enabling longer memory persistence and better error-gradient flow.
6: This model captures long temporal dependencies better than an LSTM on very long sequence tasks and the time gate parameters can be learned even from a non-optimal initialization.
7: Because the time gate limits the updates of the neuron state, the number of computes needed for the network update is also reduced.
8: By adding a computational budget term to the training loss, we obtain a network which further reduces the number of computes by at least $10\times$.
9: Finally, we propose a temporal curriculum learning schedule for the g-LSTM that helps speed up the convergence time of the equivalent LSTM on long sequences.
10: 
11: \end{abstract}
12: