abstract:b81972b5ade9f719.tex

1: \begin{abstract}

2: As a promising distributed learning paradigm, federated learning (FL) involves training deep neural network (DNN) models at the network edge while protecting the privacy of the edge clients.

3: To train a large-scale DNN model, batch normalization (BN) has been regarded as a simple and effective means to accelerate the training and improve the generalization capability.

4: However, recent findings indicate that BN can significantly impair the performance of FL in the presence of non-i.i.d. data.

5: While several FL algorithms have been proposed to address this issue, their performance still falls significantly when compared to the centralized scheme.

6: Furthermore, none of them have provided a theoretical explanation of how the BN damages the FL convergence.

7: In this paper, we present the first convergence analysis to show that under the non-i.i.d. data, the mismatch between the local and global statistical parameters in BN causes the gradient deviation between the local and global models, which, as a result, slows down and biases the FL convergence.

8: In view of this, we develop a new FL algorithm that is tailored to BN, called \texttt{FedTAN}, which is capable of achieving robust FL performance under a variety of data distributions via iterative layer-wise parameter aggregation.

9: Comprehensive experimental results demonstrate the superiority of the proposed \texttt{FedTAN} over existing baselines for training BN-based DNN models.

10: \end{abstract}

11: