b81972b5ade9f719.tex
1: \begin{abstract}
2: As a promising distributed learning paradigm, federated learning (FL) involves training deep neural network (DNN) models at the network edge while protecting the privacy of the edge clients.
3: To train a large-scale DNN model, batch normalization (BN) has been regarded as a simple and effective means to accelerate the training and improve the generalization capability.
4: However, recent findings indicate that BN can significantly impair the performance of FL in the presence of non-i.i.d. data.
5: While several FL algorithms have been proposed to address this issue, their performance still falls significantly when compared to the centralized scheme.
6: Furthermore, none of them have provided a theoretical explanation of how the BN damages the FL convergence.
7: In this paper, we present the first convergence analysis to show that under the non-i.i.d. data, the mismatch between the local and global statistical parameters in BN causes the gradient deviation between the local and global models, which, as a result, slows down and biases the FL convergence.
8: In view of this, we develop a new FL algorithm that is tailored to BN, called \texttt{FedTAN}, which is capable of achieving robust FL performance under a variety of data distributions via iterative layer-wise parameter aggregation.
9: Comprehensive experimental results demonstrate the superiority of the proposed \texttt{FedTAN} over existing baselines for training BN-based DNN models.
10: \end{abstract}
11: