1: \begin{abstract}
2: This paper proposes a fully decentralized \gls{FL} scheme for \gls{IoE} devices that are connected via multi-hop networks.
3: Because \gls{FL} algorithms hardly converge the parameters of \gls{ML} models,
4: this paper focuses on the convergence of \gls{ML} models in \textit{function spaces}.
5: Considering that the representative loss functions of \gls{ML} tasks e.g., \gls{MSE} and \gls{KL} divergence, are convex \textit{functionals},
6: algorithms that directly update functions in function spaces could converge to the optimal solution.
7: The key concept of this paper is
8: to tailor a consensus-based optimization algorithm to work in the function space
9: and achieve the global optimum in a distributed manner.
10: This paper first analyzes the convergence of the proposed algorithm in a function space, which is referred to as a meta-algorithm,
11: and shows that the spectral graph theory can be applied to the function space in a manner similar to that of numerical vectors.
12: Then, \gls{CMFD} is developed for a \gls{NN} to implement the meta-algorithm.
13: \Gls{CMFD} leverages knowledge distillation to realize function aggregation among adjacent devices without parameter averaging.
14: An advantage of \gls{CMFD} is that it works even with different \gls{NN} models among the distributed learners.
15: Although \gls{CMFD} does not perfectly reflect the behavior of the meta-algorithm,
16: the discussion of the meta-algorithm's convergence property promotes an intuitive understanding of \gls{CMFD},
17: and simulation evaluations show that \gls{NN} models converge using \gls{CMFD} for several tasks.
18: The simulation results also show that \gls{CMFD} achieves higher accuracy than parameter aggregation for weakly connected networks,
19: and \gls{CMFD} is more stable than parameter aggregation methods.
20: \end{abstract}
21: