7e3b3c58fd7399ed.tex
1: \begin{abstract}
2: We investigate the problem of agent-to-agent interaction in decentralized (federated) learning over time-varying directed graphs, and, in doing so, propose a consensus-based algorithm called \DSGTmTV. 
3: The proposed algorithm incorporates gradient tracking and heavy-ball momentum to distributively optimize a global objective function, while preserving local data privacy. Under \DSGTmTV, agents will update local model parameters and gradient estimates using information exchange with neighboring agents enabled through row- and column-stochastic mixing matrices, which we show guarantee both consensus and optimality. Our analysis establishes that \DSGTmTV~exhibits linear convergence to the exact global optimum when exact gradient information is available, and converges in expectation to a neighborhood of the global optimum when employing stochastic gradients.
4: Moreover, in contrast to existing methods, \DSGTmTV~preserves convergence for networks with uncoordinated stepsizes and momentum parameters, for which we provide explicit bounds. 
5: These results enable agents to operate in a fully decentralized manner, independently optimizing their local hyper-parameters.
6: % Finally, w
7: We demonstrate the efficacy of our approach via comparisons with state-of-the-art baselines on real-world image classification and natural language processing tasks.
8: \end{abstract}
9: