1: \begin{abstract}
2: Text to speech (TTS) is a crucial task for user interaction, but TTS model training relies on a sizable set of high-quality original datasets.
3: Due to privacy and security issues, the original datasets are usually unavailable directly.
4: Recently, federated learning proposes a popular distributed machine learning paradigm with an enhanced privacy protection mechanism.
5: It offers a practical and secure framework for data owners to collaborate with others, thus obtaining a better global model trained on the larger dataset.
6: However, due to the high complexity of transformer models, the convergence process becomes slow and unstable in the federated learning setting.
7: Besides, the transformer model trained in federated learning is costly communication and limited computational speed on clients, impeding its popularity.
8: To deal with these challenges, we propose the federated dynamic transformer.
9: On the one hand, the performance is greatly improved comparing with the federated transformer, approaching centralize-trained Transformer-TTS when increasing clients number.
10: On the other hand, it achieves faster and more stable convergence in the training phase and significantly reduces communication time.
11: Experiments on the LJSpeech dataset also strongly prove our method's advantage.
12:
13: \end{abstract}
14: