abstract:d46ce3bb3f9c64ae.tex

1: \begin{abstract}

2: This paper addresses the communication issues when estimating hyper-gradients in decentralized federated learning (FL).

3: Hyper-gradients in decentralized FL quantifies how the performance of globally shared optimal model is influenced by the perturbations in clients' hyper-parameters.

4: In prior work, clients trace this influence through the communication of Hessian matrices over a static undirected network, resulting in (i) excessive communication costs and (ii) inability to make use of more efficient and robust networks, namely, time-varying directed networks.

5: To solve these issues, we introduce an alternative optimality condition for FL using an averaging operation on model parameters and gradients.

6: We then employ Push-Sum as the averaging operation, which is a consensus optimization technique for time-varying directed networks.

7: As a result, the hyper-gradient estimator derived from our optimality condition enjoys two desirable properties; (i) it only requires Push-Sum communication of vectors and (ii) it can operate over time-varying directed networks.

8: We confirm the convergence of our estimator to the true hyper-gradient both theoretically and empirically, and we further demonstrate that it enables two novel applications: decentralized influence estimation and personalization over time-varying networks.

9: Code is available at \url{https://github.com/hitachi-rd-cv/pdbo-hgp.git}.

10: \end{abstract}

11: