1: \begin{abstract}
2: Federated learning (FL) enables on-device training over distributed networks consisting of a massive amount of modern smart devices, such as smartphones and IoT~(Internet of Things) devices.
3: However, the leading optimization algorithm in such settings, i.e., \emph{federated averaging} (FedAvg), suffers from heavy communication costs and the inevitable performance drop, especially when the local data is distributed in a non-IID way.
4: To alleviate this problem, we propose two potential solutions by introducing additional mechanisms to the on-device training.
5:
6: The first (FedMMD) is adopting a two-stream model with the MMD (Maximum Mean Discrepancy) constraint instead of a single model in vanilla FedAvg to be trained on devices.
7: Experiments show that the proposed method outperforms baselines, especially in non-IID FL settings, with a reduction of more than 20\% in required communication rounds.
8:
9: The second is FL with feature fusion (FedFusion).
10: By aggregating the features from both the local and global models, we achieve higher accuracy at fewer communication costs.
11: Furthermore, the feature fusion modules offer better initialization for newly incoming clients and thus speed up the process of convergence.
12: Experiments in popular FL scenarios show that our FedFusion outperforms baselines in both accuracy and generalization ability while reducing the number of required communication rounds by more than 60\%.
13:
14: \end{abstract}
15: