1: \begin{abstract}
2: Federated Learning (FL) has become a revolutionary paradigm in collaborative machine learning, placing a strong emphasis on decentralized model training to effectively tackle concerns related to data privacy. Despite significant research on the optimization aspects of federated learning, the exploration of generalization error, especially in the realm of heterogeneous federated learning, remains an area that has been insufficiently investigated, primarily limited to developments in the parametric regime. This paper delves into the generalization properties of deep federated regression within a two-stage sampling model. Our findings reveal that the intrinsic dimension, characterized by the entropic dimension, plays a pivotal role in determining the convergence rates for deep learners when appropriately chosen network sizes are employed. Specifically, when the true relationship between the response and explanatory variables is described by a $\beta$-H\"older function and one has access to $n$ independent and identically distributed (i.i.d.) samples from $m$ participating clients, for participating clients, the error rate scales at most as $\Tilde{\cO}\left((mn)^{-2\beta/(2\beta + \bar{d}_{2\beta}(\lambda))}\right)$, whereas for non-participating clients, it scales as $\Tilde{\cO}\left(\Delta \cdot m^{-2\beta/(2\beta + \bar{d}_{2\beta}(\lambda))} + (mn)^{-2\beta/(2\beta + \bar{d}_{2\beta}(\lambda))}\right)$. Here $\bar{d}_{2\beta}(\lambda)$ denotes the corresponding $2\beta$-entropic dimension of $\lambda$, the marginal distribution of the explanatory variables. The dependence between the two stages of the sampling scheme is characterized by $\Delta$. Consequently, our findings not only explicitly incorporate the ``closeness" of the clients, but also highlight that the convergence rates of errors of deep federated learners are not contingent on the nominal high dimensionality of the data but rather on its intrinsic dimension.
3: \end{abstract}
4: