abstract:9ea801515d8d9739.tex

1: \begin{abstract}

2: Text to speech (TTS) is a crucial task for user interaction, but TTS model training relies on a sizable set of high-quality original datasets.

3: Due to privacy and security issues, the original datasets are usually unavailable directly.

4: Recently, federated learning proposes a popular distributed machine learning paradigm with an enhanced privacy protection mechanism.

5: It offers a practical and secure framework for data owners to collaborate with others, thus obtaining a better global model trained on the larger dataset.

6: However, due to the high complexity of transformer models, the convergence process becomes slow and unstable in the federated learning setting.

7: Besides, the transformer model trained in federated learning is costly communication and limited computational speed on clients, impeding its popularity.

8: To deal with these challenges, we propose the federated dynamic transformer.

9: On the one hand, the performance is greatly improved comparing with the federated transformer, approaching centralize-trained Transformer-TTS when increasing clients number.

10: On the other hand, it achieves faster and more stable convergence in the training phase and significantly reduces communication time.

11: Experiments on the LJSpeech dataset also strongly prove our method's advantage.

12:

13: \end{abstract}

14: