abstract:5844d95b35ed07e3.tex

1: \begin{abstract} \emph{Truncated Backpropagation Through Time} (truncated

2:     BPTT, \cite{jaeger2002tutorial}) is a widespread method for

3:     learning recurrent computational graphs.

4:     Truncated BPTT keeps the computational benefits of

5:     \emph{Backpropagation Through Time} (BPTT \cite{werbos:bptt}) while

6:     relieving the need for a complete backtrack through the whole data

7:     sequence at every step.  However, truncation favors short-term

8:     dependencies: the gradient estimate of truncated

9:     BPTT is biased, so that it does not benefit from the convergence

10:     guarantees from stochastic gradient theory. We introduce \emph{Anticipated Reweighted

11:     Truncated Backpropagation} (ARTBP), an algorithm that keeps the

12:     computational benefits of truncated BPTT, while providing

13:     unbiasedness. ARTBP works by using variable truncation lengths

14:     together with carefully chosen compensation factors in the

15:     backpropagation equation. We check the viability of ARTBP on two

16:     tasks. First,

17: a simple synthetic task where careful balancing of temporal dependencies at different scales is needed: truncated BPTT displays unreliable performance,

18:     and in worst case scenarios, divergence, while ARTBP converges

19:     reliably.

20:     Second, on Penn Treebank character-level language modelling \cite{ptb_proc},

21:     ARTBP slightly outperforms truncated BPTT.

22: \end{abstract}

23: