56d6703712574bae.tex
1: \begin{abstract}
2: The conventional model/gradient aggregation-based federated learning (FL) approaches require all local models to be of the same architecture and thus may be inapplicable for many practical scenarios. Moreover, the frequent model/gradient exchange is costly for resource-limited wireless networks since modern deep neural networks usually have over-million parameters.
3: To tackle these challenges, we first devise a novel FL framework that aggregates light high-level data features, namely knowledge, in the per-round learning process. This design allows devices to design their machine models independently and remarkably reduces the communication overhead in the training process.
4: We then theoretically analyze the convergence bound of the framework under a non-convex loss function setting, revealing that scheduling more data volumes in each round helps improve the learning performance.  In addition, more scheduled data volumes should be biased towards the early rounds if the total data volumes during the entire learning course are fixed.
5: Inspired by this, we formulate an optimization problem to maximize the weighted scheduled data volumes for global loss minimization under the energy constraints of devices through device scheduling, bandwidth allocation and power control.
6: This paper provides the proof and additional experimental results of the journal version, namely ``Knowledge-aided Federated Learning for Energy-limited Wireless Networks". This paper
7: provides the proofs of Proposition 1, Lemma 3, and additional experimental results based on another heterogeneous data distribution setting. The other proposition, lemmas, and experimental results have been provided in the journal version or similar to the provided proofs.
8: \end{abstract}
9: