1: \begin{abstract}
2: This paper proposes a forward attention method for the sequence-to-sequence acoustic modeling of speech synthesis.
3: This method is motivated by the nature of the monotonic alignment from phone sequences to acoustic sequences.
4: Only the alignment paths that satisfy the monotonic condition are taken into consideration at each decoder timestep.
5: The modified attention probabilities at each timestep are computed recursively using a forward algorithm.
6: A transition agent for forward attention is further proposed, which helps the attention mechanism to make decisions whether to move forward or stay at each decoder timestep.
7: Experimental results show that the proposed forward attention method achieves faster convergence speed and higher stability than the baseline attention method.
8: Besides, the method of forward attention with transition agent can also help improve the naturalness of synthetic speech and control the speed of synthetic speech effectively.
9: %Sequence-to-sequence model applying to text-to-speech synthesis has attracted much attention recently.
10: %However, generated speech, especially of relative long utterance, suffers from problem of instability, such as missing phone and repeating phone.
11: %We proposed a forward attention model in this paper,
12: %making using of the monotonic nature of TTS task given phone sequence input.
13: %Forward attention facilitate the attention training procedure and lead to faster convergence speed.
14: %With forward attention, we can generate much longer sentences with high stability. We further
15: %designed a transition agent and obtain higher naturalness of generated speech.
16: %Moreover, we can control the speech speed conveniently under the forward structure.
17:
18: \end{abstract}
19: