abstract:92b2e87f93bd35f0.tex

1: \begin{abstract}

2: This paper proposes a forward attention method for the sequence-to-sequence acoustic modeling of speech synthesis.

3: This method is motivated by the nature of the monotonic alignment from phone sequences to acoustic sequences.

4: Only the alignment paths that satisfy the monotonic condition are taken into consideration at each decoder timestep.

5: The modified attention probabilities at each timestep are computed recursively using a forward algorithm.

6: A transition agent for forward attention is further proposed, which helps the attention mechanism to make decisions whether to move forward or stay at each decoder timestep.

7: Experimental results show that the proposed forward attention method achieves faster convergence speed and higher stability than the baseline attention method.

8: Besides, the method of forward attention with transition agent can also help improve the naturalness of synthetic speech and control the speed of synthetic speech effectively.

9: %Sequence-to-sequence model applying to text-to-speech synthesis has attracted much attention recently.

10: %However, generated speech, especially of relative long utterance, suffers from problem of instability, such as missing phone and repeating phone.

11: %We proposed a forward attention model in this paper,

12: %making using of the monotonic nature of TTS task given phone sequence input.

13: %Forward attention facilitate the attention training procedure and lead to faster convergence speed.

14: %With forward attention, we can generate much longer sentences with high stability. We further

15: %designed a transition agent and obtain higher naturalness of generated speech.

16: %Moreover, we can control the speech speed conveniently under the forward structure.

17:

18: \end{abstract}

19: