9ea801515d8d9739.tex
1: \begin{abstract}
2: Text to speech (TTS) is a crucial task for user interaction, but TTS model training relies on a sizable set of high-quality original datasets. 
3: Due to privacy and security issues, the original datasets are usually unavailable directly. 
4: Recently, federated learning proposes a popular distributed machine learning paradigm with an enhanced privacy protection mechanism. 
5: It offers a practical and secure framework for data owners to collaborate with others, thus obtaining a better global model trained on the larger dataset. 
6: However, due to the high complexity of transformer models, the convergence process becomes slow and unstable in the federated learning setting. 
7: Besides, the transformer model trained in federated learning is costly communication and limited computational speed on clients, impeding its popularity.
8: To deal with these challenges, we propose the federated dynamic transformer. 
9: On the one hand, the performance is greatly improved comparing with the federated transformer, approaching centralize-trained Transformer-TTS when increasing clients number. 
10: On the other hand, it achieves faster and more stable convergence in the training phase and significantly reduces communication time. 
11: Experiments on the LJSpeech dataset also strongly prove our method's advantage.
12: 
13: \end{abstract}
14: