1: \begin{abstract}
2: The peaky behavior of CTC models is well known experimentally.
3: However, an understanding about
4: \emph{why} peaky behavior occurs is missing,
5: and whether this is a good property.
6: We provide a formal analysis
7: of the peaky behavior and gradient descent convergence properties
8: of the CTC loss and related training criteria.
9: Our analysis provides a deep understanding why peaky behavior occurs
10: and when it is suboptimal.
11: On a simple example which should be trivial to learn for any model,
12: we prove that a feed-forward neural network
13: trained with CTC from uniform initialization
14: converges towards peaky behavior with a 100\% error rate.
15: Our analysis further explains
16: why CTC only works well together with the $\blank$ label.
17: We further demonstrate that peaky behavior does not occur
18: on other related losses including a label prior model,
19: and that this improves convergence.
20: \end{abstract}
21: