abstract:2e840572ac574c0f.tex

1: \begin{abstract}

2: Much sequential data exhibits highly non-uniform information distribution.  This cannot be correctly modeled by traditional Long Short-Term Memory (LSTM). To address that, recent works have extended LSTM by adding more activations between adjacent inputs.

3: %Such deeper structures equip LSTMs with stronger capabilities to learn data statistics, including long-term dependencies and attention, thereby achieving state-of-the-art performance.

4: However, the approaches often use a fixed depth, which is at the step of the most information content. This one-size-fits-all worst-case approach is not satisfactory, because

5: when little information is distributed to some steps, shallow structures can achieve faster convergence and consume less computation resource.

6:

7: In this paper, we develop a Depth-Adaptive Long Short-Term Memory (DA-LSTM) architecture, which can dynamically adjust the structure depending on information distribution without prior knowledge.

8: %adapt model depth to non-uniform information flow

9: %Therefore, devising a flexible model that , can save computation load without degrading performance.

10: Experimental results on real-world datasets show that DA-LSTM costs much less computation resource and substantially reduce convergence time by $41.78\%$ and $46.01 \%$, compared with Stacked LSTM and Deep Transition LSTM, respectively.

11: \end{abstract}

12: