1: \begin{abstract}
2: In this paper, we propose Hierarchical Federated Learning with Momentum Acceleration (HierMo), a three-tier worker-edge-cloud federated learning algorithm that applies momentum for training acceleration. Momentum is calculated and aggregated in the three tiers.
3: We provide convergence analysis for HierMo, showing a convergence rate of $\mathcal{O}\left(\frac{1}{T}\right)$. In the analysis, we develop a new approach to characterize model aggregation, momentum aggregation, and their interactions.
4: Based on this result, {we prove that HierMo achieves a tighter convergence upper bound compared with HierFAVG without momentum}. We also propose HierOPT, which optimizes the aggregation periods (worker-edge and edge-cloud aggregation periods) to minimize the loss given a limited training time.
5: %Since we need to consider model aggregation, momentum aggregation and their interactions, edge virtual update and cloud virtual update are proposed which are important intermediate steps to prove the overall convergence upper bound.
6: %In order to allow HierMo to achieve the best performance under limited total training delay, we propose Hierarchical Optimizing Aggregation Periods (HierOPT) algorithm to derive the setting of edge and cloud aggregation periods.
7: By conducting the experiment, we verify that HierMo outperforms existing mainstream benchmarks under a wide range of settings. In addition, HierOPT can achieve a near-optimal performance when we test HierMo under different aggregation periods.
8:
9:
10:
11:
12:
13:
14:
15:
16: \end{abstract}
17: