1: \begin{abstract}
2: Distributed machine learning (ML) over wireless networks have attracted a lot of attentions recently, where one of the acknowledged frameworks is federated learning (FedL). Conventional FedL network architecture resembles a star graph that only considers server-to-device and device-to-server communications, which suffers from a series of limitation, such as overloading the network, extensive bandwidth utilization, not capturing the characteristics of large-scale wireless network architecture. We motivate a new learning paradigm called \textit{fog learning (FogL)}. FogL distributes an ML task through a continuum from the end devices to the main server. It benefits from a multi-layer structure that explicitly considers multi-stage data relaying. Its hybrid learning style adds a new learning dimension by introducing device-to-device (D2D) communications among different network modules located at different network layers. We demonstrate that D2D communications can be used to perform local \textit{distributed average consensus} among the devices. We then carry the convergence analysis for FogL and find the upper bound of convergence of FogL with respect to the time varying cluster topologies, the number of consensus performed at different clusters, and loss function related characteristics. We propose a set of policies on the number of consensus rounds at different network layers that can be used to guarantee a finite optimality gap. We demonstrate that with fine tuning of the consensus rounds, FogL can achieve the optimal solution of the problem under finite consensus rounds. We then introduce the idea of tapering the consensus rounds among the devices through time and space and propose a distributed adaptive consensus algorithm that tunes the number of consensus rounds performed at different network layers over time.
3: \end{abstract}
4: