d46ce3bb3f9c64ae.tex
1: \begin{abstract}
2: This paper addresses the communication issues when estimating hyper-gradients in decentralized federated learning (FL).
3: Hyper-gradients in decentralized FL quantifies how the performance of globally shared optimal model is influenced by the perturbations in clients' hyper-parameters.
4: In prior work, clients trace this influence through the communication of Hessian matrices over a static undirected network, resulting in (i) excessive communication costs and (ii) inability to make use of more efficient and robust networks, namely, time-varying directed networks.
5: To solve these issues, we introduce an alternative optimality condition for FL using an averaging operation on model parameters and gradients. 
6: We then employ Push-Sum as the averaging operation, which is a consensus optimization technique for time-varying directed networks.
7: As a result, the hyper-gradient estimator derived from our optimality condition enjoys two desirable properties; (i) it only requires Push-Sum communication of vectors and (ii) it can operate over time-varying directed networks.
8: We confirm the convergence of our estimator to the true hyper-gradient both theoretically and empirically, and we further demonstrate that it enables two novel applications: decentralized influence estimation and personalization over time-varying networks.
9: Code is available at \url{https://github.com/hitachi-rd-cv/pdbo-hgp.git}. 
10: \end{abstract}
11: