abstract:26e8e51ccfeae15d.tex

1: \begin{abstract}

2: The peaky behavior of CTC models is well known experimentally.

3: However, an understanding about

4: \emph{why} peaky behavior occurs is missing,

5: and whether this is a good property.

6: We provide a formal analysis

7: of the peaky behavior and gradient descent convergence properties

8: of the CTC loss and related training criteria.

9: Our analysis provides a deep understanding why peaky behavior occurs

10: and when it is suboptimal.

11: On a simple example which should be trivial to learn for any model,

12: we prove that a feed-forward neural network

13: trained with CTC from uniform initialization

14: converges towards peaky behavior with a 100\% error rate.

15: Our analysis further explains

16: why CTC only works well together with the $\blank$ label.

17: We further demonstrate that peaky behavior does not occur

18: on other related losses including a label prior model,

19: and that this improves convergence.

20: \end{abstract}

21: