0288ea9246de6a68.tex
1: \begin{abstract}
2: This work focuses on the decentralized deep learning optimization framework. We propose Adjacent Leader Decentralized Gradient Descent (AL-DSGD), for improving final model performance, accelerating convergence, and reducing the communication overhead of decentralized deep learning optimizers. AL-DSGD relies on two main ideas. Firstly, to increase the influence of
3: the strongest learners on the learning system it assigns weights to different neighbor workers according to both their performance and the degree when averaging among them, and it applies a corrective force on the workers dictated by both the currently best-performing neighbor and the neighbor with the maximal degree. 
4: Secondly, to alleviate the problem of the deterioration of the convergence speed and performance of the nodes with lower degrees, AL-DSGD relies on dynamic communication graphs, which effectively allows the workers to communicate with more nodes while keeping the degrees of the nodes low. 
5: Experiments demonstrate that AL-DSGD accelerates the convergence of the decentralized state-of-the-art techniques
6: % , D-PSGD~\cite{lian2017can} and MATCHA~\cite{wang2022matcha} , 
7: and improves their test performance especially in the communication constrained environments. We also theoretically prove the convergence of the proposed scheme.
8: Finally, we release to the community a highly general and concise PyTorch-based library for distributed training of deep learning models that supports easy implementation of any distributed deep learning approach ((a)synchronous, (de)centralized). 
9: \end{abstract}
10: